Abstract: The rapid growth of observational data presents unprecedented opportunities to enhance both the predictability and mechanistic understanding of Earth systems. However, fully harnessing big Earth data needs computational frameworks that bridge the gap between physics-based models and machine learning. In this talk, I will first demonstrate how AI methods can significantly improve the prediction of environmental systems. Despite their predictive accuracy, machine learning models often lack physical interpretability, limiting their ability for scientific inquiry. To address this, I will introduce the developed hybrid, differentiable modeling framework that unifies physical models with machine learning in an end-to-end trainable system. This framework autonomously learns from large observations while maintaining physical clarity. The machine learning components can be seamlessly embedded into physical backbones to assimilate multi-source data, support automatic parameterization, and represent uncertain processes. I will showcase applications of this framework in simulating and understanding the terrestrial water cycle and its interactions with ecosystems at continental and global scales. This talk will highlight how differentiable modeling not only improves the modeling ability in both data-rich and data-scarce scenarios, but also provides a systematic pathway to enhancing model structures, deciphering uncertain physical relations, and facilitating knowledge discovery in Earth system sciences.


IACS Seminar Speaker: Dapeng Feng, Stanford Univeristy

Location: IACS Seminar Room

You are cordially invited to attend the biweekly Brookhaven AI Mixer (BAM). BAM includes three short talks on AI research happening at BNL, followed by an open mixer over coffee and snacks for everyone to network and discuss all things AI. The first half hour will consist of presentations that will be available via ZOOM, and the second half hour will be for in person only networking.

Join us every other Tuesday at noon in CDSD's Training Room (building 725, 2nd floor) to learn about interesting AI methods and applications, engage with potential collaborators, prepare for pending FASST funding calls, and build a community of AI for Science at BNL.

Tuesday, January 7, 2025, 12:00 pm -- CDS, Bldg. 725, Training Room

Speakers

Maria Zawadowicz, EBNN--ML for Atmospheric Aerosol Research

Mohammad Atif, CDS--An Extensible Digital Twin Framework

Guang Zhao, CDS--Pareto Prompt Optimization

Join ZoomGov Meeting: https://bnl.zoomgov.com/j/1615289117?pwd=Hqkbj9itxWrFnkhZ8rQXHPInO2gxdF.1

Meeting ID: 161 528 9117
Passcode: 991382

Title: Cultural Biases, World Languages, and User Privacy in Large Language Models
Abstract: In this talk, I will highlight three key aspects of large language models: (1) cultural bias in LLMs and pre-training data, (2) decoding algorithm for low-resource languages, and (3) human-centered design for real-world applications.

The first part focuses on systematically assessing LLMs' favoritism towards Western culture. We take an entity-centric approach to measure the cultural biases among LLMs (e.g., GPT-4, Aya, and mT5) through natural prompts, story generation, sentiment analysis, and named entity tasks. One interesting finding is that a potential cause of cultural biases in LLMs is the extensive use and upsampling of Wikipedia data during the pre-training of almost all LLMs. The second part will introduce a constrained decoding algorithm that can facilitate the generation of high-quality synthetic training data for fine-grained prediction tasks (e.g., named entity recognition, event extraction). This approach outperforms GPT-4 on many non-English languages, particularly low-resource African languages. Lastly, I will showcase an LLM-powered privacy preservation tool designed to safeguard users against the disclosure of personal information. I will share findings from an HCI user study that involves real Reddit users utilizing our tool, which in turn informs our ongoing efforts to improve the design of AI models.
Bio:

Wei Xu is an Associate Professor in the College of Computing and Machine Learning Center at the Georgia Institute of Technology, where she is the director of the NLP X Lab. Her research interests are in natural language processing and machine learning, with a focus on Generative AI, robustness and fairness of large language models, multilingual LLMs, as well as AI for science, education, accessibility, and privacy research. She is a recipient of the NSF CAREER Award, Google Academic Research Award, CrowdFlower AI for Everyone Award, Best Paper Awards and Honorable Mentions at COLING'18, ACL'23, ACL'24. She also received research funds from DARPA and IARPA. She is currently an executive board member of NAACL. Join Zoom Meeting https://stonybrook.zoom.us/j/98855994362?pwd=F2qnpwL85fhCBHAEW9ZBpXihfwGHsj.1 (ID: 98855994362, passcode: 172797) Join by phone (US) +1 646-876-9923 (passcode: 172797) Joining instructions: https://www.google.com/url?q=https://applications.zoom.us/addon/invitation/detail?meetingUuid%3DuDJcUTvyQueZkCaUSAwFlg%253D%253D%26signature%3Da3d49e0f7f2e74e7130f7308c74bd85ba7b99587b98ba2e34238bb657ca51a09%26v%3D1&sa=D&source=calendar&usg=AOvVaw2jTn5cjfRG8vXU8KHHlU2Y Meeting host: H.Andrew.Schwartz@stonybrook.edu

Join Zoom Meeting:
https://stonybrook.zoom.us/j/98855994362?pwd=F2qnpwL85fhCBHAEW9ZBpXihfwGHsj.1

The next AI Institute seminar speaker will be Chao Chen of Biomedical Informatics, on Monday November 29 at noon via zoom:

https://stonybrook.zoom.us/j/96233844681?pwd=aVVsUnIzMWJDMHRqVXcrQU5HMjFVQT09

He will be talking on the Detection of Trojan Attacks to Deep Neural Networks - A Topological Perspective, with his abstract and bio below.


Abstract: Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model's behavior through Trojaned training samples, i.e., samples with special trigger injected and labels altered. To identify a Trojaned model at deployment is challenging, due to limited access to the training data. We propose to identify Trojaned neural networks using methods from topological data analysis. In particular, we propose to (1) inspect high-order topological features of the neuron interactions and (2) reverse engineer the injected triggers using a topological loss. These approaches take different angles and reveal insights into the behavior of neural networks when their strong memorialization power is exploited maliciously. The work has been accepted to NeurIPS'21. I will also briefly mention other research directions from my group, including incorporating topological information into deep image analysis, topology-inspired graph neural networks, and robust training of neural networks with label noise. These works have been published in ICLR, ICML, NeurIPS, ECCV, ICCV and AAAI in recent years.
Bio: Dr. Chao Chen is an assistant professor of Biomedical Informatics at Stony Brook University. His research interests span topological data analysis (TDA), machine learning and biomedical image analysis. He develops principled learning methods inspired by the theory from TDA, such as persistent homology and discrete Morse theory. These methods address problems in biomedical image analysis, robust machine learning, and graph neural networks from a unique topological view. His research results have been published in major machine learning, computer vision, and medical image analysis conferences. He is serving as an area chair for MICCAI, AAAI, CVPR and NeurIPS.

The Future Histories Studio at Stony Brook University and Guggenheim New York are collaborating to present a day-long symposium on October 24 at the Simons Center for Geometry and Physics. This conference will explore urgent questions at the intersection of artificial intelligence, machine learning, and the human, and is co-organized by Noam Segal, LG Electronics Associate Curator at Guggenheim New York. In this role, Noam plays an important part in researching these topics, promoting a deeper understanding of the ways in which contemporary artists use new technologies, and developing and supporting the Guggenheim's engagement with technology-based art under the LG Guggenheim Art and Technology Initiative.

The event examines the profound transformations brought by automation--how AI compels us to rethink cognition, agency, and the conditions of reason itself. As these systems become ever more embedded in daily life--largely invisible yet deeply consequential--they challenge the very foundations of subjectivity and governance. We are surrounded by logics we cannot fully access, yet which shape our realities, while new forms of alterity arise--distinct modes of reasoning that propose collective unknowns beyond established frameworks of knowledge.

This emerging terrain invites us to consider cognitive plurality, where biological and technological intelligences generate new categories, concepts, and understandings. Once unique to humans--art, authorship, judgment, invention--are now co-articulated with systems of computation and planetary-scale infrastructure. The symposium brings together artists, scholars, and technologists to probe the cultural, philosophical, and ecological implications of this entanglement.

The concept of neurodiversity has shown that neurological differences such as autism, ADHD, and dyslexia are not deficits but variations that enrich collective life. Extending this to machines can be provocative: just as neurodivergence unsettles fixed definitions of intelligence, so too AI challenges anthropocentric assumptions about cognition. Yet the analogy is limited. Neurodiversity is rooted in the lived struggles of human communities, while machines neither think nor struggle. Human cognition involves perception, learning, memory, and reasoning through embodied experience. Machine cognition, by contrast, is computational pattern recognition and statistical modeling, without consciousness or lived context, and with only narrow forms of sensing.

For this reason, the symposium advances a broader framework of cognitive diversity or technodiversity--a recognition of proliferating intelligences, human, machinic, and hybrid, as part of a shared ecology. This shift calls for new models of creativity, responsibility, and collaboration that honor the irreducibility of human thought while engaging the radical alterity of machine logics.

Location: Stony Brook Simons Center for Geometry and Physics, Della Pietra Family Auditorium

This event is co organized by the Guggenheim New York

Zoom Link: https://stonybrook.zoom.us/j/98533029054?pwd=5FXO6lWGTJssCADEYkYbA7sjaacPRX.1

Meeting ID: 985 3302 9054

Passcode: 436997

Abstract:

Semantic segmentation, the task of assigning a semantic label to each pixel in an image, is a fundamental problem in the field of Computer Vision. with crucial applications in domains like autonomous driving, drone imagery and medical image analysis. Despite advancements in deep learning architectures, state-of-the-art models still heavily depend on large-scale pixel-level annotations, which are costly and time-consuming to acquire. To address this issue, Semi-Supervised Segmentation (SSS) has emerged as a promising solution, leveraging a small set of labeled images alongside a larger corpus of unlabeled data to reduce the annotation burden. In this proposal, I aim to investigate the challenges of SSS and propose approaches to address them. Existing SSS methods rely on a teacher-student framework to generate pseudo-labels for unlabeled images, which are then used for model training. However, this approach presents two major challenges. Pixel-level consistency fails to effectively capture contextual information, and pseudo-labels are noisy, especially in the early stages of training. To address the challenge of noisy pseudo-labels, existing methods rely on confidence-based thresholding to identify reliable pseudo-labels. However, during early training phases, when the model is poorly calibrated, this approach can select high-confidence but noisy pseudo-labels. To address this, we propose a novel approach that reduces reliance on model confidence to select reliable pseudo-labels. Our method employs an ensemble of a segmentation model and an object detection model to select more reliable pseudo-labels, which are then used to weight pseudo-labels using rank statistics, reducing the influence of noisy labels in training. Next, to address both the challenge of capturing contextual information and noisy pseudo-labels I introduce a novel Multi-scale Patch-based Multi-label Classifier (MPMC), which incorporates patch-level contextual information and reduces the impact of noisy pixel pseudo-labels by using the predictions of the patch-level Multi-label classifier to detect noisy labels, enhancing overall segmentation performance. While my work so far has focused on effectively utilizing unlabeled data to improve segmentation performance, as part of our future work, I will explore the use of textual information, such as category descriptions, for segmentation tasks. In limited labeled data scenarios it is more challenging to align visual features with textual features from large language models (LLMs).

The Department of AI and Society (AIS) at the University at Buffalo is hosting a two-day AI and Society Workshop focused on building AI systems by society, for society. This workshop brings together researchers and community organizers to explore how AI systems can be developed through meaningful collaboration across disciplines.

Topics include:

  • Labor and AI
  • Public services and AI
  • Community-centered AI systems
  • Intersections of humanities, social sciences, arts, and computing

The vision of UB's Department of AI and Society is to create a future where AI systems are built by society, for society. AIS centers community engagement at every stage of AI development through collaboration across disciplines and sectors. AIS was established with a $5 million grant from SUNY, and this workshop is made possible through that support.

Who Should Attend?

  • Researchers
  • Students
  • Community organizers
  • Practitioners interested in AI's societal impact

More about the event

Register here