Abstract: How do humans learn the sound patterns of their language? Despite a variety of methods and advances in phonotactic learning, there is still a paucity of computational research, methods and data for languages with tones. In this talk, I will explore this question specifically in light of tone languages, where pitch plays a crucial role in distinguishing words' meaning. I provide an implementation of the Bottom-Up Factor Inference Algorithm over Autosegmental Representations (BUFIA-AR), which learns the rules governing possible tone patterns. Using a dataset of Hausa, a West African tone language, the algorithm successfully identifies patterns that are not permitted in the language. These results (i) confirm long-standing linguistic generalizations, (ii) make more specific predictions about exceptional cases, and (iii) reveal previously unnoticed patterns. The results show how mathematical models of sound structure can be brought into dialogue with both linguistic theory and computational learning, highlighting the broader potential of formal approaches to capture human linguistic knowledge.

Bio: Han Li is a fifth-year Ph.D. student in Linguistics department, specializing in computational linguistics under the supervision of Professor Jeff Heinz. Her research focuses on how sound patterns in language can be formally represented and computationally learned, bridging theoretical linguistics and computer science.

Location: Institute for Advanced Computational Science, Seminar Room

Zoom Meeting: https://stonybrook.zoom.us/j/94043459206?pwd=3ra47h8HghOFRfobRBjZaDMyTwialr.1
Meeting ID: 940 4345 9206
Passcode: 332717

Join us for an engaging panel discussion featuring researchers who participated in our inaugural AI JAM session on February 26th. Our panelists will share their firsthand experiences using large language models to tackle complex scientific problems, with a special focus on prompt engineering strategies, discussing both breakthroughs and challenges encountered during this collaborative initiative. Learn how these cutting-edge AI tools are being applied to real-world research questions and discover insights that could inform your own scientific endeavors. Attendees are encouraged to come prepared with questions about prompt engineering for the panel discussion.

Moderator: Adolfy Hoisie, Deputy Director, Computing and Data Sciences

Kevin Yager, Group Leader, AI-Accelerated Nanoscience, Center for Functional Nanomaterials
Lingda Li, Associate Computational Scientist, Systems, Architecture and Computing Technologies, Computing and Data Sciences
Liguo Wang, Director of Scientific Operations, Laboratory for BioMolecular Structure (LBMS), National Synchrotron Light Source II
Weiguo Yin, Physicist, Condensed Matter Theory, Condensed Matter Physics and Materials Science Department

Location: CDS, Bldg. 725, Training Room

Join ZoomGov Meeting: https://bnl.zoomgov.com/j/1606837837?pwd=Tc0mwQqLXpDfYOIaoaurmpLD2mMlzS.1 (Meeting ID)

Passcode: 822553



Abstract: In high-dimensional data spaces, vast empty regions often exist where no known data points are present. These empty spaces are not merely gaps but hold untapped potential for discovering novel configurations, optimizing parameters, and improving decision-making processes. However, traditional exploration techniques struggle to identify and leverage these regions due to the curse of dimensionality. To address this, we introduce the Empty Space Search Algorithm (ESA), a scalable, physics-inspired method that systematically identifies and explores these uncharted voids. ESA operates by modeling the data space as a dynamic system, using a repulsion-attraction mechanism to locate optimal empty space configurations (ESCs) without requiring exhaustive search. Building upon ESA, we present GapMiner, a visual analytics system that integrates human-in-the-loop AI to iteratively refine and validate ESCs. GapMiner combines parallel coordinate visualization, interactive optimization, and deep learning-based predictive modeling to enhance the efficiency of empty space exploration. This methodology has broad applications, including accelerating convergence in evolutionary algorithms through a more diverse initial population, optimizing adversarial learning strategies, and discovering novel parameter configurations in reinforcement learning. Our approach demonstrates that empty space is not just an absence of data but a frontier for new possibilities in high-dimensional problem-solving.
Bio: Xinyu Zhang received his B.E. in Computer Science from Shandong University, Taishan College, in 2019. He is currently a final-year Ph.D. candidate in the Department of Computer Science at Stony Brook University, advised by Prof. Klaus Mueller. His research focuses on multivariate data analysis, scientific visualization, and reinforcement learning. He has published multiple papers in top-tier journals and conferences, including IEEE TVCG and NeurIPS.
*this seminar will be held in person (food provided on a first come, first serve basis), and online (zoom link below)!
Topic: IACS Student Seminar Speaker: Xinyu Zhang
Time: Feb 26, 2025 12:00 PM Eastern Time (US and Canada)
Join Zoom Meeting
https://stonybrook.zoom.us/j/91848218975?pwd=lfITFa61GaXZ2Wsa1B1OnbLQMmXvOE.1

Meeting ID: 918 4821 8975
Passcode: 027337

This virtual presentation series is designed to inform the Stony Brook University research community about the Research Funding Landscape of key topic areas. Our Strategic Research Initiatives team will provide insight into the rapidly shifting funding environment using policy briefs, budgetary priorities, and relevant legislation. We will highlight federal and state priorities in the current and upcoming years to help Stony Brook researchers develop strategies for pursuing funding in a rapidly shifting environment. This series is moderated by Mónica Bugallo, Interim Vice President for Research & Innovation.

Join us for the third in the series, focused on the artificial intelligence landscape:


Translating the Funding Landscape for Stony Brook Researchers: Artificial Intelligence
Presented by Catherine Chen, Ph.D., Research Development Associate
Faculty Respondent: Assistant Professor Nav Nidhi Rajput, Department of Materials Science and Chemical Engineering
Wednesday, April 22, 2026 at 2 pm to 3 pm

Registration is Required

You are cordially invited to attend the biweekly Brookhaven AI Mixer (BAM). BAM includes three short talks on AI research happening at BNL, followed by an open mixer over coffee and snacks for everyone to network and discuss all things AI. The first half hour will consist of presentations that will be available via ZOOM, and the second half hour will be for in person only networking.

Join us every other Tuesday at noon in CDSD's Training Room (building 725, 2nd floor) to learn about interesting AI methods and applications, engage with potential collaborators, prepare for pending FASST funding calls, and build a community of AI for Science at BNL.

Tuesday, January 7, 2025, 12:00 pm -- CDS, Bldg. 725, Training Room

Speakers

Chuntian Cao, CDS AID - Neural Network Potential (NNP) for Battery Electrolytes

Yeonju Go, NPP Physics - Generative AI for High-Energy Nuclear Physics

Gilchan Park, CDS AID - Graph RAG: Indexing, Retrieval and Generation

Join ZoomGov Meeting: https://bnl.zoomgov.com/j/1615289117?pwd=Hqkbj9itxWrFnkhZ8rQXHPInO2gxdF.1

Meeting ID: 161 528 9117
Passcode: 991382

You are cordially invited to attend the biweekly Brookhaven AI Mixer (BAM). BAM includes three short talks on AI research happening at BNL, followed by an open mixer over coffee and snacks for everyone to network and discuss all things AI. The first half hour will consist of presentations that will be available via ZOOM, and the second half hour will be for in person only networking.

Join us every other Tuesday at noon in CDSD's Training Room (building 725, 2nd floor) to learn about interesting AI methods and applications, engage with potential collaborators, prepare for pending FASST funding calls, and build a community of AI for Science at BNL.

Tuesday, January 7, 2025, 12:00 pm -- CDS, Bldg. 725, Training Room

Speakers

Maria Zawadowicz, EBNN--ML for Atmospheric Aerosol Research

Mohammad Atif, CDS--An Extensible Digital Twin Framework

Guang Zhao, CDS--Pareto Prompt Optimization

Join ZoomGov Meeting: https://bnl.zoomgov.com/j/1615289117?pwd=Hqkbj9itxWrFnkhZ8rQXHPInO2gxdF.1

Meeting ID: 161 528 9117
Passcode: 991382

Abstract: Recent studies have highlighted the vulnerability of Natural Language Processing (NLP) and Vision-Language Models (VLMs) to backdoor attacks, posing significant security risks. Understanding these attack strategies is crucial for assessing model robustness and developing effective defenses. This thesis proposal aims to investigate the vulnerability of language and vision-language models, analyze abnormal behaviors in backdoor-attacked models, and develop defense methods to enhance safety of modern machine learning models at deployment.


We investigate the internal mechanisms of backdoored NLP models, identifying a distinct attention focus drifting phenomenon, where trigger tokens hijack attention regardless of the input context. Through comprehensive qualitative and quantitative analysis, we provide insights into the underlying mechanisms that enable backdoor attacks. Building on these insights, we propose detection methods to differentiate backdoored models from clean ones, through inspecting both the attention distribution and the model predictions. To better understand the vulnerability, we develop advanced backdoor attack strategies targeting language models in classification tasks. For BERT variants, we introduce Trojan Attention Loss (TAL), a novel method that directly manipulates attention patterns to enhance backdoor effectiveness, ensuring stealth and robustness. Vision-Language Models have demonstrated strong performance in recent years. Yet their vulnerability is largely underexplored. We investigate advanced backdoor attack strategies on Vision-Language Models, focusing on image-to-text generation tasks. We demonstrate how backdoors can be embedded in complex multimodal tasks while maintaining semantic integrity under poisoned inputs. Additionally, we propose innovative techniques for injecting backdoors without requiring access to the original training data, expanding the feasibility of real-world attacks.

This proposal provides novel insights into the internal mechanisms of backdoored models, propose effective detection strategies, and develop advanced attack techniques that expose critical vulnerabilities. These findings underscore the urgent need for robust security measures to defend against emerging backdoor threats in deep learning models. The results have been published in top venues including ICLR, ECCV, NAACL, EMNLP, etc.

Speaker: Weimin Lyu


Zoom link: https://stonybrook.zoom.us/j/99880605139?pwd=cfWbRG6n9v3GXEa7OqvXa5cOp5eLBv.1
Meeting ID: 998 8060 5139
Passcode: 843302
CSE 600 Seminar Series | Fall 2025



Abstract:

We often talk about AI as if it begins with a dataset and ends with an application. But behind every model lie two sets of actors who are rarely acknowledged in technical documentation: the workers who train AI systems and the researchers who try to make sense of them. This talk brings both groups into view.
Dr. Ben Zhang will offer an on-the-ground examination of the prevailing values and invisible labor that underpin commercial AI production and data production. Drawing on ethnographic research inside AI data annotation centers in China, he introduces the concept of precision labor to unpack the labor dimension of constructing, managing, and performing technical accuracy. This concept highlights the hidden and excessive labor required to reconcile the ambiguity and uncertainty involved in AI training. A precision labor lens challenges the legitimacy and sustainability of the relentless pursuit of technical accuracy, raising new questions about its consequences and implications.
On the other end of the pipeline, as LLMs become embedded in society, social scientists like Dr. Jieshu Wang is scrutinizing their potential biases while employing them as research tools. She will present her recent work auditing LLM responses across different contexts, revealing that LLMs exhibit varying levels of environmental awareness and disproportionately reward institutional prestige in peer-review simulations. She also demonstrates how LLMs can serve as useful tools in social-science pipelines, e.g., extracting location information, inferring demographics, parsing citations, mapping social networks, and analyzing occupational data.
By placing these two worlds side by side - the labor of training AI and the scholarly efforts to study it - we show why responsible AI should go beyond the deployment phase - emphasizing fairness audits, and model explainability. It requires reimaging the values, labor regimes, and social science practices that shape AI systems from annotation to analysis.


Bios:

Dr. Jieshu Wang is an interdisciplinary researcher studying the human and social dimensions of artificial intelligence (AI) and how people can thrive in an AI-integrated future. She combines computational methods with qualitative insights to trace technology trends and understand their broader societal impact. She earned her Ph.D. in Human and Social Dimensions of Science and Technology from Arizona State University, after earlier degrees in Civil Engineering, Economics, and Science and Technology Studies. She has also worked as a patent examiner, an editor at a popular science magazine, and co-founded Synced (机器之心), an AI-focused media company in China. Her research looks both backward and forward. Backward-looking, she examines how AI are created, who creates them, and who is missing from the process. Forward-looking, she studies how AI is transforming the way we live, connect, invent, work, and adapt, as well as how AI might help address challenges such as climate change and workforce transitions.
Dr. Ben Zhang is an Assistant Professor in the Department of Technology. His research explores the production and sociotechnical impacts of AI systems in critical areas such as work, health, and sustainability. Drawing from his background in Human-Computer Interaction (HCI), Human-Centered AI, and Science and Technology Studies (STS), he employs a life-cycle-centered approach to holistically examine the promises and harms of these systems and to inform the design of responsible AI infrastructures across their development, deployment, and governance. Ben received his Ph.D. in Information Science from the University of Michigan. Ben's work has been supported by competitive awards and fellowships, including the University of Michigan Rackham Predoctoral Fellowship and the Weizenbaum Fellowship. His research has appeared in premier computing venues, including ACM CHI, ACM CSCW, and AAAI ICWSM.

Location: NCS 120
Johannes Hachmann, University of Buffalo Assistant Professor of Chemical Engineering presents Making Machine Learning Work in Chemistry

The use of modern machine learning, informatics and data mining approaches is a relatively new development in the chemical and materials domain. These techniques have been exceedingly successful in other application fields, and since there is no fundamental reason why they should not have a similarly transformative impact on chemical and materials research, there is now a concerted effort by the community to introduce data science in this new context. However, adapting techniques from other application domains for the study of chemical and materials systems requires a substantial rethinking and redevelopment of the existing methods.

In this presentation, we will discuss our work on designing advanced, physics-infused neural network architectures, the fusion of unsupervised clustering with supervised regression for local ensemble models, active and transfer learning techniques, bootstrapping approaches to minimize our training data footprint, methods to increase the applicability domain of data-derived models and automated hyperparameter optimization.

Biosketch: Johannes Hachmann is an Assistant Professor of Chemical Engineering at the University at Buffalo (UB), the Director of the Engineering Science in Data Science graduate program, a Core Member of the UB Computational and Data-Enabled Science and Engineering graduate program, and a Faculty Member of the New York State Center of Excellence in Materials Informatics. He earned a Dipl.-Chem. degree (2004) after undergraduate studies at the universities of Jena and Cambridge, M.Sc. (2007) and Ph.D. (2010) degrees in Chemistry from Cornell University, and he conducted postdoctoral research at Harvard University before joining the UB faculty in 2014. The research of the Hachmann Group fuses (first-principles) molecular and materials modeling with virtual high-throughput screening and modern data science (i.e., the use of database technology, machine learning and informatics) to advance a data-driven discovery and rational design paradigm in the chemical and materials disciplines. One of the centerpieces of the group's efforts is the creation of an open, general-purpose software ecosystem for the data-driven design of chemical systems and the exploration of chemical space. This work was recognized with a 2018 NSF CAREER Award.