Have you ever found yourself looking up medical symptoms online? Any search engine would generate a litany of sites to visit, packed with possible explanations for the set of symptoms. But are all of these claims true? The Internet, unfortunately, is not always the most reliable source of information. So, for the 60% of adults in the United States who have sought out health-related information, how do we ensure that this information is true?
Many Americans never doubt the truthfulness of online health information. A team of Stony Brook faculty from the Department of Computer Science put this credibility to the test: MS student Dhruv Kela and Kritik Mathur, PhD candidates Chaoyuan Zuo and Noushin Salek Faramarzi, and Professor Ritwik Banerjee. Their research titled “Beyond belief: a cross-genre study on perception and validation of health information online,” published in the International Journal of Data Science and Analytics (2022). This team analyzed the perceived credibility of health-related information, and goes on to refute it.
Online health information is often extracted from research papers, and is then regurgitated into articles. In these articles, the information has been cut down into a blunt medical claim, with limited supporting information, and no proof of its veracity. In some cases, the article might include a hyperlink to the research it claims to source, but even then, most readers do not go on to do their own fact-checking.
Medical misinformation is a catalyst for unwarranted panic and unrest. To combat this, the research explores the premise of “why and how misinformation finds success in sowing chaos,” says Banerjee. They examine the perceived credibility of medical information, and the truthfulness of the claims made. Artificial intelligence is the engine of this work, specifically Natural Language Processing (NLP), as it is “the backbone of this research,” says Banerjee.
Medical news often is written with a semblance of authority to lead readers to believe that the content is credible, by using medical jargon and presenting claims that seem to be supported by research. In many instances, readers take the claims at face value without vetting the information provided. It is these claims and its language that made it tricky to identify the primary claim.
In order to complete this principal component of the research, it was modeled as a sequence labeling task, which is a specific sect of pattern recognition techniques.
“Even harder was to identify whether the claim made in a news article is really being made in the cited research. This is an incredibly difficult problem, because medical research language is very different from the English we normally use,” says Banerjee.
In turn, the team of Stony Brook faculty constructed information retrieval algorithms to evaluate if claims matched the cited research.
The findings of this work confirm a harsh reality: health-related news can be misleading. Through the analysis of the dataset, it became clear that not all of the articles contain supporting citations. Instead, these claims are blatantly propagated without evidence. Even with sourced external citations, medical newswire has been established as deceptive, at times.
Three annotators, non-medical graduate students, were prompted to independently score if claims made in assigned medical newswire were supported by the abstract of the publication. The results alarmingly conclude that “only 27.09% of all claim–abstract pairs were given the highest score of 5 by both raters,” meaning that it was supported. More concerningly, “7.39% got a score of 2 or lower,” demonstrating that these claims were wholly unsubstantiated.
These statistics present a frightening insight into the medical media that we consume daily. It is misleading, as it often presents claims that do not have any footing. Even if some medical newswire is accurate, how can we ensure that what we are reading is trustworthy?
The crux of the issue is that information today is not always accurate. For whatever reason this may be, humans are innocent receptacles of dishonest and harmful information. In times such as this, it is important to remain vigilant about the reality of medical and domain-specific newswire as a whole.
“Our work indicates that cross-genre information retrieval may be the way to identify false or inaccurate information. But there is very little work done so far in cross-genre natural language information retrieval. I hope that our work sparks new interest and inspires more research in this direction,” says Banerjee.
-Alyssa Dey, Communications Assistant