Did a Person Write That, or Was it AI?

Can you tell which of these two reviews was written by a person?

Absolutely stellar! From the moment I walked in, the Hilton Chicago exceeded expectations. The room was spacious, elegant, and offered breathtaking views of the city. The staff was incredibly friendly and accommodating, ensuring a seamless stay. With its prime location and top-notch amenities, this hotel is a definite must for anyone visiting Chicago.
Stayed here for several nights. The hotel is quite rich in history and they do a great job of sharing that throughout the hotel. The location is terrific and within walking distance to so many things. Our room was well-appointed and housekeeping did a good job.

The answer: 2

Surprised?

You are not alone.

A recent study conducted by Stony Brook University researchers looked at the reviews of 20 hotels in Chicago, trying to distinguish between the experiences shared by humans on TripAdvisor and those generated by OpenAI’s ChatGPT.

Nikita Soni, a Ph.D. student at Stony Brook, says, “When OpenAI launched ChatGPT in 2022, it took on 100 million users within just two months. Its reach was tremendous, and it found applications across several industries. Thousands of organizations are now using AI models like ChatGPT to generate code, provide explanations, educate people, and write emails, blogs, and academic essays.” All of which is to say — since these AI models are trained on content from the internet, and a lot of it is inaccurate, it has become increasingly important to distinguish a piece of text written by a human being from information generated by AI.

Even though Generative AI tools like ChatGPT and Claude have only been around for a couple of years, the content they create is so good in quality that it’s often indistinguishable from text written by humans. What’s more, apart from being used for daily tasks and language translation, these AI models are also learning about human nature by comparing their own outputs to human-generated content. A recent study even suggests that their moral judgment and decision-making are almost perfectly aligned with humans.

So how do we differentiate between human and AI-generated content? According to Vasudha Varadarajan, Ph.D. student and Research Assistant at Stony Brook, the approach needs to be different from current methods. Instead of trying to understand how the AI model was designed, we should focus on its output. “This approach,” she adds, “allows us to study the language differences between human and AI texts rather than get into the complexities of how the AI model works.”

Nikita and Vasudha explored this idea alongside Siddharth Mangalik, a Ph.D. student, and Professor H. Andrew Schwartz, Director of HLAB and Associate Professor in the Department of Computer Science at Stony Brook University. They looked at studies that propose using emotional, personality, and demographic cues (like age and gender), to differentiate text generated by humans vs advanced language models.

According to Siddharth, “Our logic is based on studies of bots which show that they have very little variation in emotionality, personality, and demographics. Bots only show positive emotional sentiments, when compared to human authors.”

It’s true — AI-generated texts consistently contain more positive emotional language, more adjectives, and more analytic writing. Unlike humans, they tend to be neutral by default, they lack emotional expression. In fact, recent studies show that ChatGPT contains significantly less negative emotion, hate speech, and punctuation as compared to human-authored texts. It is even known to lack purpose and readability.

So, while Generative AI models like ChatGPT can produce grammatically correct content, they may struggle to capture the range of human emotion and personality, which are key elements of our language.

Building on these conclusions, the team examined thirteen psychologically grounded and fundamental human traits, including age, gender, openness, extraversion, empathy, etc. to compare the reviews of the 20 hotels in Chicago.

“Since models like ChatGPT are trained on hotel reviews that may or may not be true, we knew that the content they’ve been trained to create will be inherently deceptive,” says Nikita, “and our language detector needed to be able to recognize that.”

For example, in one generated hotel review, ChatGPT said, “The room was spacious, clean, and had all the amenities I needed for a comfortable stay. The bed was comfortable and I slept like a baby every night.” Clearly, the AI and its language are misleading the reader, because it’s ungrounded in the material world. As a result, it engages in inherent deception — AI cannot have an experience like a human being, but it can write as if it did.

Professor H. Andrew Schwartz shares the results of their AI system, designed to use human traits to distinguish between human language and AI-generated text. “Experiments show that our system could differentiate between AI text and truthful humans with an accuracy of almost 97%, and between truthful and deceptive humans with an accuracy of 69%.”

Their contributions show that ChatGPT is more limited in its expression of human traits with personality. In other words, recent versions of ChatGPT write like an average 35-year-old person. They're also more open and more emotionally stable, and they have a positivity bias. These findings align with previous research, and show that ChatGPT is more limited in its expression of human traits with personality.

As AI models continue to improve their ability to imitate human language, we need to find better ways to differentiate their content from hours. This can be done by focusing on and comparing the accuracy of various language detection methods.

We also need to keep exploring how AI might perform other tasks that involve human and AI-generated text classifications, especially in the fields of healthcare, disaster management, and safety, which are key to human survival.

Communications Assistant
Ankita Nagpal

AI Innovation Institute