Twitter Twisting: Research Behind COVID-19 Misinformation Shared Online

Since its beginning, the COVID-19 pandemic has been a major topic of discussion on social media. People will often tweet claims about COVID-19 with cited sources making them appear to be truthful, but are the claims being made always true?

Professor Ritwik Banerjee from the Stony Brook University Department of Computer Science, Stony Brook PhD alumnus Chaoyuan Zuo of the School of Journalism and Communications at Nankai University, and additional researchers from Colorado State University investigated this topic in their paper titled “Seeing Should Probably not be Believing: The Role of Deceptive Support in COVID-19 Misinformation on Twitter,” building on Stony Brook’s previous work on misinformation.

“If somebody’s tweeting about something and the tweet contains a link to a news article, is the statement in the tweet actually true to what the original article was saying? That’s the main story,” says Banerjee.

This research involved multiple stages of work, starting with a dataset by Banda et al. containing more than 383 million tweets about COVID-19.

“What we did was called transfer learning where you take an original mathematical representation of information presented in human language and modify it through an algorithm that is learning not just from the original data (Banda et al.), but also your additional annotations. After this transfer, the representations are better suited for your particular research,” says Banerjee.

The tweets from the dataset were first put through a data filtering process to rule out tweets that were not of use for this particular research. For example, tweets that didn’t contain a cited news article.

Next, the remaining tweets went through a data cleaning process in order to conduct the parts of the research involving natural language processing (NLP). Necessary semantic changes were made such as removing hashtags and writing out abbreviations.

After data cleaning, the tweets entered the first task. This task consisted of annotators that classified tweets as check-worthy or not check-worthy. There was also the involvement of supervised learning.

“In a normal sentence, a claim extraction system will extract very simple statements as claims because grammatically they are claims,” says Banerjee, “but in terms of fact checking, they’re not really claims we want to fact check because they are very obvious statements.”

The final part consisted of the second task. This is when the tweets were determined to be faithful (or not) to the cited source. The tweets were first filtered again to remove posts that contain multiple sentences, particularly those with personal opinions, sarcasm, or other commentary on the check-worthy claim. The annotators then determined whether the cited source supports the tweet, to which it was labeled supported or unsupported. In this task, the tweets also used distant supervision.

The study found that “a significant fraction of check-worthy claims - 27.5% of the annotated sample (which corresponds to at least 1% of the test data) - contain deceptive support,” revealing that even though a tweet may appear truthful, it is not always the case.

“The existence of the citation gives us this perception that it’s trustworthy,” says Banerjee. The social media posts you view may be tempting to automatically trust when you see a cited source, but before you believe what you read, keep this research in mind.

-Sara Giarnieri, Communications Assistant