AI Inovation Institute

Get hands-on with data cleaning techniques using Python and AI tools. Join SBU Libraries' Data Literacies Lead, Ahmad Pratama, to learn how to identify and rectify errors, handle missing data, and prepare your dataset for analysis. This workshop introduces you to powerful yet easy-to-use tools and techniques that make data cleaning efficient and effective, turning chaotic data into valuable insights.

Please register for the Data Cleaning with Python and AI here.

Read more about Data Cleaning with Python and AI

AI can help you write, you hear. AI can save you time, leverage your skills, enhance your productivity. . . . But you also hear: AI output is not reliable, not adequate for advanced tasks/learning, not ethical to use -- you could get in deep trouble for using AI tools without adequate mastery and caution. Which way is it?
Come join this hands-on workshop where you will explore AI tools and their affordances. Engage in writing tasks to learn how to use AI tools effectively and responsibly.
Sign up for a seat now: https://docs.google.com/forms/d/e/1FAIpQLSd0iDTKkTYnkxFd4LkgqbtP97zQSS4FI_MiPVm7p6IY5SGwSg/viewform

Read more about Writing Workshop - Using AI Tools

Title: Cultural Biases, World Languages, and User Privacy in Large Language Models
Abstract: In this talk, I will highlight three key aspects of large language models: (1) cultural bias in LLMs and pre-training data, (2) decoding algorithm for low-resource languages, and (3) human-centered design for real-world applications.

The first part focuses on systematically assessing LLMs' favoritism towards Western culture. We take an entity-centric approach to measure the cultural biases among LLMs (e.g., GPT-4, Aya, and mT5) through natural prompts, story generation, sentiment analysis, and named entity tasks. One interesting finding is that a potential cause of cultural biases in LLMs is the extensive use and upsampling of Wikipedia data during the pre-training of almost all LLMs. The second part will introduce a constrained decoding algorithm that can facilitate the generation of high-quality synthetic training data for fine-grained prediction tasks (e.g., named entity recognition, event extraction). This approach outperforms GPT-4 on many non-English languages, particularly low-resource African languages. Lastly, I will showcase an LLM-powered privacy preservation tool designed to safeguard users against the disclosure of personal information. I will share findings from an HCI user study that involves real Reddit users utilizing our tool, which in turn informs our ongoing efforts to improve the design of AI models.
Bio:

Wei Xu is an Associate Professor in the College of Computing and Machine Learning Center at the Georgia Institute of Technology, where she is the director of the NLP X Lab. Her research interests are in natural language processing and machine learning, with a focus on Generative AI, robustness and fairness of large language models, multilingual LLMs, as well as AI for science, education, accessibility, and privacy research. She is a recipient of the NSF CAREER Award, Google Academic Research Award, CrowdFlower AI for Everyone Award, Best Paper Awards and Honorable Mentions at COLING'18, ACL'23, ACL'24. She also received research funds from DARPA and IARPA. She is currently an executive board member of NAACL.

Join Zoom Meeting
https://stonybrook.zoom.us/j/98855994362?pwd=F2qnpwL85fhCBHAEW9ZBpXihfw… (ID: 98855994362, passcode: 172797)

Join by phone
(US) +1 646-876-9923 (passcode: 172797)

Joining instructions: https://www.google.com/url?q=https://applications.zoom.us/addon/invitat…

Meeting host: H.Andrew.Schwartz@stonybrook.edu

Join Zoom Meeting:
https://stonybrook.zoom.us/j/98855994362?pwd=F2qnpwL85fhCBHAEW9ZBpXihfwGHsj.1

Read more about Cultural Biases, World Languages, and User Privacy in Large Language Models

Join the Department of Computer Science as we welcome Lyle Ungar, University of Pennsylvania, who will be delivering a lecture on 'Measuring Cultural Variation using Natural Language Processing.'

When: 11/08/24 @ 2:30 PM
Where: New Computer Science Building, Room 120.

Reception to follow.

Abstract: Cultures vary widely in how they view the world, for example being more individualist or collectivist. Such cultural differences are, of course, reflected in the words that people use. We first show a variety of ways in which multilingual language models are not multicultural; they speak Hindi or Mandarin, but still think like Americans. In contrast, we then present a scalable method that uses embedding-derived lexica to successfully measure regional variation in culture.

Bio: Lyle Ungar is a Professor of Computer and Information Science at the University of Pennsylvania, where he also holds secondary appointments in Psychology, Bioengineering, Genomics and Computational Biology, and Operations, Information and Decisions. His group uses natural language processing and explainable AI for psychological research, including analyzing social media and cell phone sensor data to better understand the drivers of physical and mental well-being. They are currently building socio-emotionally sensitive GPT-based tutors and coaches.

Read more about Measuring Cultural Variation using Natural Language Processing

Abstract: This talk shows how machine learning can address challenges in Astrophysics. We specifically focus on black hole simulations and supernova observations. First, we present a super-resolution technique for black hole simulations that avoids the need for high-resolution labels by leveraging the Hamiltonian and momentum constraints from general relativity. This method reduces constraint violations by one to two orders of magnitude. Next, we introduce Maven, a multimodal foundation model for supernova science. Using contrastive learning to align photometric and spectroscopic data, Maven achieves state-of-the-art results in classification and redshift estimation by pre-training on synthetic data and fine-tuning on real observations.

Bio: Thomas Helfer is a computational physicist specializing in deep learning and physics. Currently based at the Institute for Advanced Computational Science at Stony Brook University, Thomas was previously a postdoctoral fellow at Johns Hopkins and did his PhD with Eugene Lim at King's College in London. In his work, he looks to bridge topics; in his PhD, he bridged theoretical particle physics and gravitational waves. Now, in his postdoctoral work, he aims to find novel applications of deep learning in astrophysics.

*please note: this seminar will be held in a hybrid format*

Location: IACS Seminar Room OR Join Zoom Meeting
https://stonybrook.zoom.us/j/98617630652?pwd=tb4hplPgb3bTTifPCJTCcsn3P9vX8y.1

Meeting ID: 986 1763 0652
Passcode: 882994

Read more about Applying Deep Learning in Astrophysical Simulations and Observations

This symposium will highlight how artificial intelligence (AI) can assist in dementia detection, research and clinical care. For example, the use of robotics to assist with dementia care therapy is truly inspirational and cutting-edge for clinicians, trainees and the community at large, including assisted living facilities. The symposium will also focus on the role of AI in early detection of dementia and in identifying characteristics associated with future cognitive decline.

Learn more and register at https://cme.stonybrookmedicine.edu/continuing-medical-education/conferences/233/alzheimers-symposium-ai-the-future-of-dementia-care-2024/11/15/2024

Read more about Alzheimer's Disease Symposium: Artificial Intelligence-The Future of Dementia Care

Title: Building foundation models for scientific data Seminar

Speaker: Ruben Ohana, Ph.D. and Michael McCabe, Ph.D - Flatiron Institute, New York

Abstract: Foundation models are very large architectures trained on large-scale datasets and can be used to transfer knowledge from a domain to another. Scientific data, particularly numerical simulations of partial differential equations (PDEs), presents unique challenges due to its complexity and the need for domain expertise to assess prediction quality, complicating the building of the first foundation models in this field. In this talk, we will develop our approach of building foundation models for scientific data, highlighting the requirements and expectations for achieving meaningful results. We will also introduce The Well, a comprehensive collection of datasets encompassing multi-scale simulations of fluid dynamics, astrophysics, and biological systems. The Well serves as a foundation for developing models that generalize across diverse physical phenomena, aiming to accelerate scientific discovery through large-scale learning.

Join Zoom Meeting: https://bnl.zoomgov.com/j/1606898802?pwd=GbbPiLGHlEokDskxjeFheMFWfuboxO.1
Meeting ID: 160 689 8802
Passcode: 281575

Read more about Brookhaven AI/ML Seminar (AIMS)

We live in a new scientific paradigm: the Big Data era, in which a lot of data is available for almost anything. In this new paradigm, the driving force is to use data directly to learn about chemical and physics systems employing artificial intelligence. This paradigm has proven helpful in simulating realistic physical, biological, and chemical models, yielding impressive results. Similarly, the insight gained in these situations can be used to improve our understanding of fundamental processes. In that regard, we want to answer the question: Can a machine learn chemistry? The answer to this question is still debatable, but we will show our ideas and methods to find the answer. We will also discuss our results on predicting atom-diatom reactions and other avenues and work in progress in our group.

Please register for the STEM Speaker Series: Can a Machine learn Chemistry here.

Read more about STEM Speaker Series: Can a machine learn chemistry?

Discover how U.S. Census Bureau Tools can help you find free data for your research projects, community, and more. See how to access the latest American Community Survey and 2020 Census data for various geographies including New York City and Long Island at data.census.gov. Learn about Community Resilience Estimates and how to navigate My Community Explorer; an interactive map-based tool which highlights demographic and socioeconomic data that measure inequality. This session will involve live demonstrations and hands-on exercises for participants. Registrants will receive the Zoom link one day prior to the event.

Please Register for SBU Libraries' AI Club: Exploring Census Data here.

Read more about Stony Brook University Libraries' AI Club Presents: Exploring Census Data with the U.S. Census Bureau

Abstract:
Coarse grained (CG) models alleviate the drawbacks of all-atom simulations. The latter still pose challenges because they are computationally expensive and give access to limited spatiotemporal scales, despite the use of modern high-performance computing clusters. CG models ignore some of the atomistic degrees of freedom, leading to fewer interatomic interactions, hence less computing time. Introducing such models emphasizes the need to properly manage these multiple scales, by carefully deriving potentials and reconstructing conformations from their CG representations, usually with the help of Machine Learning. Following a bottom-up and force matching approach, we train a Physics-Informed Neural Network to extract the CG force field parameters from all-atom simulation data. We verify our approach by applying it to fibrin monomers to study multiple-fibrin polymerization in solution at the microsecond scale, after modifying the force field to incorporate further non-bonded interactions, not present in the training data. Access to these scales will allow us to study the effects of some of the molecules' components. Furthermore, we modify recent solutions in data-driven protein backmapping. Taking advantage of the developments in graph neural networks and variational inference, we introduce an intermediate step in the all-atom reconstruction of a molecule given its CG configuration, in an attempt to more accurately de-coarsen structures whose atom-to-CG-beads ratio is very high. The combined effect of our new forward and inverse coarse graining methodology will enable the in silico study of many phenomena that are highly dynamic and intrinsically multiscale.

Bio:
Georgios Kementzidis is a third year PhD student in the Department of Applied Mathematics and Statistics at Stony Brook University. His advisor is Dr. Yuefan Deng. His research interests lie at the intersection of Computational Science, molecular dynamics (MD) simulations, and Machine Learning (ML) applications to Computational Biophysics. He is particularly interested in coarse-graining and multi-scale simulations.

*Note: this seminar will be held in-person (food provided on a first-come, first serve basis) and online*

Join Zoom Meeting https://stonybrook.zoom.us/j/99510099036?pwd=EyowuLBGvUVLZDBlG6F6chkMICFOZ7.1
Meeting ID: 995 1009 9036
Passcode: 132419

Read more about AI-Guided Management of Multiple Scales for Molecular Dynamics Simulations