Social media can be a tremendous force for good during an emergency. News agencies and local governments can disseminate urgent alerts in an instant, providing valuable information to a distraught public. Unfortunately, as services like Twitter are user-driven and unfiltered, plenty of errant info can spread like wildfire too.
Someone’s pants are on fire here, as this totally untrue tweet went out during the London Riots of 2011.
Carlos Castillo, Barbara Poblete and Marcelo Mendoza claim to have developed an algorithm that can sniff out truthful news tweets and identify bogus ones. It works on the premise that there’s a common denominator among “credible” news tweets, and similarly, “non-credible” ones also share some similarities. In all, 16 characteristics were identified and plugged into the formula, which the team used to filter and sort the tweets according to credibility. Whether it was during the London Riots or Hurricane Sandy, humanity just can’t shake the penchant for anarchy. Or can it? Chilean researchers say they have developed a new tool that can help keep the misinformation at bay, at least on Twitter.
In the paper “The Power of Prediction with Social Media,” which is slated to be published in the journal Internet Research early next year, the authors describe impressive results from test experiments that examined several thousand messages on the microblogging platform. They applied the algorithm to tweets from a non emergency timeframe (several days in Spring 2010), and found that it managed to identify the true tweets 86 percent of the time. When they sampled tweets during and right after the 2010 Chilean earthquake, the score remained fairly steady, at 82 percent.
Castillo, Poblete and Mendoza haven’t disclosed the specific 16 characteristics — likely because, if everyone knew what they were, it would be exceedingly easy to game the algorithm, no? — but they did offers some general info about the nature of true and false news tweets:
- tend to be sent by users with a lot of followers
- tend to have a negative sentiment
- are generally lengthier
- are more likely to include URLs (especially links for the 10,000 most visited domains)
Non credible tweets…
- are more likely to have question marks and exclamation points
- tend to use first- and third-person pronouns
Other areas they considered include emoticon usage, user mentions, number of retweets and how often the user previously tweeted about the topic. This lines up with findings from India’s Institute of Information Technology. (Link opens PDF.) Researchers there found that credible tweets were less likely to include swear words, and non credible ones were far more likely to have frowny emoticons than smiley faces.
Of course, there is no algorithm that’s 100 percent accurate, which means human beings will have to continue doing what they’ve always done — take any non official info with a healthy dose of salt.