Cultural Biases, World Languages, and User Privacy in Large Language Models

Location

New Computer Science-2-Room 220 (50 Seats) (50)

Event Description

Title: Cultural Biases, World Languages, and User Privacy in Large Language Models
Abstract: In this talk, I will highlight three key aspects of large language models: (1) cultural bias in LLMs and pre-training data, (2) decoding algorithm for low-resource languages, and (3) human-centered design for real-world applications.

The first part focuses on systematically assessing LLMs' favoritism towards Western culture. We take an entity-centric approach to measure the cultural biases among LLMs (e.g., GPT-4, Aya, and mT5) through natural prompts, story generation, sentiment analysis, and named entity tasks. One interesting finding is that a potential cause of cultural biases in LLMs is the extensive use and upsampling of Wikipedia data during the pre-training of almost all LLMs. The second part will introduce a constrained decoding algorithm that can facilitate the generation of high-quality synthetic training data for fine-grained prediction tasks (e.g., named entity recognition, event extraction). This approach outperforms GPT-4 on many non-English languages, particularly low-resource African languages. Lastly, I will showcase an LLM-powered privacy preservation tool designed to safeguard users against the disclosure of personal information. I will share findings from an HCI user study that involves real Reddit users utilizing our tool, which in turn informs our ongoing efforts to improve the design of AI models.
Bio:

Wei Xu is an Associate Professor in the College of Computing and Machine Learning Center at the Georgia Institute of Technology, where she is the director of the NLP X Lab. Her research interests are in natural language processing and machine learning, with a focus on Generative AI, robustness and fairness of large language models, multilingual LLMs, as well as AI for science, education, accessibility, and privacy research. She is a recipient of the NSF CAREER Award, Google Academic Research Award, CrowdFlower AI for Everyone Award, Best Paper Awards and Honorable Mentions at COLING'18, ACL'23, ACL'24. She also received research funds from DARPA and IARPA. She is currently an executive board member of NAACL.

Join Zoom Meeting
https://stonybrook.zoom.us/j/98855994362?pwd=F2qnpwL85fhCBHAEW9ZBpXihfw… (ID: 98855994362, passcode: 172797)

Join by phone
(US) +1 646-876-9923 (passcode: 172797)

Joining instructions: https://www.google.com/url?q=https://applications.zoom.us/addon/invitat…

Meeting host: H.Andrew.Schwartz@stonybrook.edu

Join Zoom Meeting:
https://stonybrook.zoom.us/j/98855994362?pwd=F2qnpwL85fhCBHAEW9ZBpXihfwGHsj.1