Title: Cultural Biases, World Languages, and User Privacy in Large Language Models
Abstract: In this talk, I will highlight three key aspects of large language models: (1) cultural bias in LLMs and pre-training data, (2) decoding algorithm for low-resource languages, and (3) human-centered design for real-world applications.

The first part focuses on systematically assessing LLMs' favoritism towards Western culture. We take an entity-centric approach to measure the cultural biases among LLMs (e.g., GPT-4, Aya, and mT5) through natural prompts, story generation, sentiment analysis, and named entity tasks. One interesting finding is that a potential cause of cultural biases in LLMs is the extensive use and upsampling of Wikipedia data during the pre-training of almost all LLMs. The second part will introduce a constrained decoding algorithm that can facilitate the generation of high-quality synthetic training data for fine-grained prediction tasks (e.g., named entity recognition, event extraction). This approach outperforms GPT-4 on many non-English languages, particularly low-resource African languages. Lastly, I will showcase an LLM-powered privacy preservation tool designed to safeguard users against the disclosure of personal information. I will share findings from an HCI user study that involves real Reddit users utilizing our tool, which in turn informs our ongoing efforts to improve the design of AI models.
Bio:

Wei Xu is an Associate Professor in the College of Computing and Machine Learning Center at the Georgia Institute of Technology, where she is the director of the NLP X Lab. Her research interests are in natural language processing and machine learning, with a focus on Generative AI, robustness and fairness of large language models, multilingual LLMs, as well as AI for science, education, accessibility, and privacy research. She is a recipient of the NSF CAREER Award, Google Academic Research Award, CrowdFlower AI for Everyone Award, Best Paper Awards and Honorable Mentions at COLING'18, ACL'23, ACL'24. She also received research funds from DARPA and IARPA. She is currently an executive board member of NAACL.

Join Zoom Meeting
https://stonybrook.zoom.us/j/98855994362?pwd=F2qnpwL85fhCBHAEW9ZBpXihfw… (ID: 98855994362, passcode: 172797)

Join by phone
(US) +1 646-876-9923 (passcode: 172797)

Joining instructions: https://www.google.com/url?q=https://applications.zoom.us/addon/invitat…

Meeting host: H.Andrew.Schwartz@stonybrook.edu

Join Zoom Meeting:
https://stonybrook.zoom.us/j/98855994362?pwd=F2qnpwL85fhCBHAEW9ZBpXihfwGHsj.1

Join the Department of Computer Science as we welcome Lyle Ungar, University of Pennsylvania, who will be delivering a lecture on 'Measuring Cultural Variation using Natural Language Processing.'

When: 11/08/24 @ 2:30 PM
Where: New Computer Science Building, Room 120.

Reception to follow.

Abstract: Cultures vary widely in how they view the world, for example being more individualist or collectivist. Such cultural differences are, of course, reflected in the words that people use. We first show a variety of ways in which multilingual language models are not multicultural; they speak Hindi or Mandarin, but still think like Americans. In contrast, we then present a scalable method that uses embedding-derived lexica to successfully measure regional variation in culture.

Bio: Lyle Ungar is a Professor of Computer and Information Science at the University of Pennsylvania, where he also holds secondary appointments in Psychology, Bioengineering, Genomics and Computational Biology, and Operations, Information and Decisions. His group uses natural language processing and explainable AI for psychological research, including analyzing social media and cell phone sensor data to better understand the drivers of physical and mental well-being. They are currently building socio-emotionally sensitive GPT-based tutors and coaches.


Abstract: This talk shows how machine learning can address challenges in Astrophysics. We specifically focus on black hole simulations and supernova observations. First, we present a super-resolution technique for black hole simulations that avoids the need for high-resolution labels by leveraging the Hamiltonian and momentum constraints from general relativity. This method reduces constraint violations by one to two orders of magnitude. Next, we introduce Maven, a multimodal foundation model for supernova science. Using contrastive learning to align photometric and spectroscopic data, Maven achieves state-of-the-art results in classification and redshift estimation by pre-training on synthetic data and fine-tuning on real observations.

Bio: Thomas Helfer is a computational physicist specializing in deep learning and physics. Currently based at the Institute for Advanced Computational Science at Stony Brook University, Thomas was previously a postdoctoral fellow at Johns Hopkins and did his PhD with Eugene Lim at King's College in London. In his work, he looks to bridge topics; in his PhD, he bridged theoretical particle physics and gravitational waves. Now, in his postdoctoral work, he aims to find novel applications of deep learning in astrophysics.

*please note: this seminar will be held in a hybrid format*


Location: IACS Seminar Room OR Join Zoom Meeting
https://stonybrook.zoom.us/j/98617630652?pwd=tb4hplPgb3bTTifPCJTCcsn3P9vX8y.1

Meeting ID: 986 1763 0652
Passcode: 882994

This symposium will highlight how artificial intelligence (AI) can assist in dementia detection, research and clinical care. For example, the use of robotics to assist with dementia care therapy is truly inspirational and cutting-edge for clinicians, trainees and the community at large, including assisted living facilities. The symposium will also focus on the role of AI in early detection of dementia and in identifying characteristics associated with future cognitive decline.

Learn more and register at https://cme.stonybrookmedicine.edu/continuing-medical-education/conferences/233/alzheimers-symposium-ai-the-future-of-dementia-care-2024/11/15/2024

Title: Building foundation models for scientific data Seminar

Speaker: Ruben Ohana, Ph.D. and Michael McCabe, Ph.D - Flatiron Institute, New York

Abstract: Foundation models are very large architectures trained on large-scale datasets and can be used to transfer knowledge from a domain to another. Scientific data, particularly numerical simulations of partial differential equations (PDEs), presents unique challenges due to its complexity and the need for domain expertise to assess prediction quality, complicating the building of the first foundation models in this field. In this talk, we will develop our approach of building foundation models for scientific data, highlighting the requirements and expectations for achieving meaningful results. We will also introduce The Well, a comprehensive collection of datasets encompassing multi-scale simulations of fluid dynamics, astrophysics, and biological systems. The Well serves as a foundation for developing models that generalize across diverse physical phenomena, aiming to accelerate scientific discovery through large-scale learning.

Join Zoom Meeting: https://bnl.zoomgov.com/j/1606898802?pwd=GbbPiLGHlEokDskxjeFheMFWfuboxO.1
Meeting ID: 160 689 8802
Passcode: 281575

We live in a new scientific paradigm: the Big Data era, in which a lot of data is available for almost anything. In this new paradigm, the driving force is to use data directly to learn about chemical and physics systems employing artificial intelligence. This paradigm has proven helpful in simulating realistic physical, biological, and chemical models, yielding impressive results. Similarly, the insight gained in these situations can be used to improve our understanding of fundamental processes. In that regard, we want to answer the question: Can a machine learn chemistry? The answer to this question is still debatable, but we will show our ideas and methods to find the answer. We will also discuss our results on predicting atom-diatom reactions and other avenues and work in progress in our group.

Please register for the STEM Speaker Series: Can a Machine learn Chemistry here.

Discover how U.S. Census Bureau Tools can help you find free data for your research projects, community, and more. See how to access the latest American Community Survey and 2020 Census data for various geographies including New York City and Long Island at data.census.gov. Learn about Community Resilience Estimates and how to navigate My Community Explorer; an interactive map-based tool which highlights demographic and socioeconomic data that measure inequality. This session will involve live demonstrations and hands-on exercises for participants. Registrants will receive the Zoom link one day prior to the event.

Please Register for SBU Libraries' AI Club: Exploring Census Data here.

Abstract:
Coarse grained (CG) models alleviate the drawbacks of all-atom simulations. The latter still pose challenges because they are computationally expensive and give access to limited spatiotemporal scales, despite the use of modern high-performance computing clusters. CG models ignore some of the atomistic degrees of freedom, leading to fewer interatomic interactions, hence less computing time. Introducing such models emphasizes the need to properly manage these multiple scales, by carefully deriving potentials and reconstructing conformations from their CG representations, usually with the help of Machine Learning. Following a bottom-up and force matching approach, we train a Physics-Informed Neural Network to extract the CG force field parameters from all-atom simulation data. We verify our approach by applying it to fibrin monomers to study multiple-fibrin polymerization in solution at the microsecond scale, after modifying the force field to incorporate further non-bonded interactions, not present in the training data. Access to these scales will allow us to study the effects of some of the molecules' components. Furthermore, we modify recent solutions in data-driven protein backmapping. Taking advantage of the developments in graph neural networks and variational inference, we introduce an intermediate step in the all-atom reconstruction of a molecule given its CG configuration, in an attempt to more accurately de-coarsen structures whose atom-to-CG-beads ratio is very high. The combined effect of our new forward and inverse coarse graining methodology will enable the in silico study of many phenomena that are highly dynamic and intrinsically multiscale.

Bio:
Georgios Kementzidis is a third year PhD student in the Department of Applied Mathematics and Statistics at Stony Brook University. His advisor is Dr. Yuefan Deng. His research interests lie at the intersection of Computational Science, molecular dynamics (MD) simulations, and Machine Learning (ML) applications to Computational Biophysics. He is particularly interested in coarse-graining and multi-scale simulations.

*Note: this seminar will be held in-person (food provided on a first-come, first serve basis) and online*

Join Zoom Meeting https://stonybrook.zoom.us/j/99510099036?pwd=EyowuLBGvUVLZDBlG6F6chkMICFOZ7.1
Meeting ID: 995 1009 9036
Passcode: 132419

The SUNY Office of Research, Innovation & Economic Development (ORIED) is hosting a webinar, Pathways to Innovation: Exclusive STEM Opportunities for Students at Premier Labs, with the Air Force Research Laboratory (AFRL), the Griffiss Institute and Brookhaven National Laboratory (BNL).

Please join us on October 30 from 12:30 - 2:00 pm to learn more about the labs and the wide variety of research, education, and workforce development programs they offer.

Register here: https://rfsuny.zoom.us/webinar/register/WN_fjWNU9l8Sr6WO_M3AoZ-Rw?mc_cid=50c2045945&mc_eid=357e15f9df#/registration

The Provost's Spotlight Talks feature eminent visitors to the university as well as Stony Brook faculty members who have recently been recognized for outstanding contributions in their field.

Transmedia artist Stephanie Dinkins, Kusama endowed chair in art in the College of Arts and Sciences at Stony Brook University, brings her expertise in AI to the next Spotlight Talk with The Stories We Encode: AI, Love and the Future of Algorithmic Care on Tuesday, October 22, at 3:30 pm in the Charles B. Wang Center Theatre.

Working at the intersection of emerging technologies and social collaboration, Dinkins was named a 2023 TIME 100 Most Influential People in AI. She was recognized for her work with Not the Only One, an ongoing project in which she trained an AI on three generations of Black women to give it cultural roots, a deep history, and a perspective that existing systems do not offer.

The event is free and open to the public, and the discussion will be followed by a reception in the Wang Theatre lobby, hosted by the College of Arts and Sciences for new and promoted faculty.


About the Talk

AI's impact on society necessitates addressing longstanding human rights issues and prejudices. To ensure AI benefits humanity, we must confront institutional biases, rethink our relationship with other beings and emerging technologies, and reconcile ideals with actual power structures. This involves recognizing systemic inequalities, redefining human identity, and equitably distributing resources. AI, if developed and used ethically, offers an opportunity to reimagine a more equitable world for all inhabitants.