With the COVID-19 pandemic, anti-Asian sentiment, especially Sinophobia, has become a big topic of discussion in the realm of social media. As a result, studies have focused on anti-Asian sentiment data during the pandemic. However, they fail to include anti-Asian sentiment data before the pandemic. This brings about the misconception that anti-Asian sentiment has only recently come to be. In reality, it has been around for decades.
Yongjun (Josh) Zhang, Assistant Professor in the Stony Brook University Department of Sociology and Institute for Advanced Computational Science, and Yifan Sun, Assistant Professor in the Stony Brook University Department of Computer Science, engaged in research studying anti-Asian sentiment titled “Multiplex Anti-Asian Sentiment before and during the Pandemic: Introducing New Datasets from Twitter Mining.” With these datasets, Professor Zhang and Professor Sun revealed how anti-Asian sentiment magnified as it entered the pandemic stages.
“Scapegoating minorities in the United States is a long-term thing, it’s not new,” says Zhang.
There were different experts involved in this research, which involved the collaboration of various departments.
“You see a trend. Social scientists are working with computer science scholars, and computer science scholars are working with social scientists. I think this is kind of an emerging field,” says Zhang.
This research involved data mining from the Twitter Historical Database in order to study the development of anti-Asian sentiment from before and during COVID-19. Specifically, they used “computational tools with natural language processing and machine learning methods to detect hate speech on Twitter before and during the pandemic” (Lin et al.). Different hate/counter-hate keywords and hashtags were used to detect the tweets needed for the study, for instance “MakeChinaPay” and “StopAAPIHate” (Lin et al.). For reference, AAPI is an acronym for Asian American and Pacific Islander.
“Twitter has an API (Application Programming Interface) which allows scholars to query its historical database, unlike other media platforms where you can only get partial data, maybe some random sample,” says Zhang.
The 68.38 million tweets in the study’s datasets were sorted into four categories: COVID-related anti-AAPI tweets, non-COVID-related anti-AAPI tweets, Chinese politics, and counter-hate tweets (Lin et al.).
The political climate during the pandemic was found to be influential regarding anti-Asian sentiment. The datasets revealed an increase in COVID-19 related hate terms when former President Donald Trump tweeted using the term “Chinese Virus” (Lin et al.).
“It’s very astonishing to see that this hate speech and abusive language on Twitter is everywhere, especially after U.S. politicians, specifically our former president, tweeted Chinese Virus… there was a huge spike after that,” says Zhang.
“As long as we can definitively show that it’s there, it then becomes up to the policymakers and Twitter’s CEO board to decide okay, is this really the tone we want for our platform? If not, are there measures we can take to counteract it?” says Sun.
These datasets are the beginning of future research endeavors, especially in areas such as “computational social science, machine learning, and hate speech detection” (Lin et al.).
“This is the first step. It’s really important to get the first step right,” says Sun.
-Sara Giarnieri, Communications Assistant