ICB&DD 19th Annual Symposium

Iwao Ojima, Director, ICB&DD
Ivet Bahar Chair, Organizing Committee
Dima KozakovCo-Chair, OrganizingCommittee

There will be poster sessions on projects conducted in the ICB&DD member's laboratories aswell as other laboratories in the area. Awards will be given to the best three posters.

Please see the link for the registration and poster sessions in:
https://www.stonybrook.edu/commcms/icbdd/https://forms.gle/Wh4UzVx9U4HWStXb8

You are cordially invited to attend the biweekly Brookhaven AI Mixer (BAM). BAM includes three short talks on AI research happening at BNL, followed by an open mixer over coffee and snacks for everyone to network and discuss all things AI. The first half hour will consist of presentations that will be available via ZOOM, and the second half hour will be for in person only networking.

Join us every other Tuesday at noon in CDSD's Training Room (building 725, 2nd floor) to learn about interesting AI methods and applications, engage with potential collaborators, prepare for pending FASST funding calls, and build a community of AI for Science at BNL.

Tuesday, January 7, 2025, 12:00 pm -- CDS, Bldg. 725, Training Room

Speakers

Jianda Chen, EBNN - Improving the stability and accuracy of PDE-ML hybrid AGCMs

Boyang Li, CDS - Accelerating Materials Discovery using Machine Learning

Jaehye on Do, NPP Isotopes - Using LLMs for Isotopes Research and Production

Join ZoomGov Meeting: https://bnl.zoomgov.com/j/1615289117?pwd=Hqkbj9itxWrFnkhZ8rQXHPInO2gxdF.1

Meeting ID: 161 528 9117
Passcode: 991382

Abstract:

Capturing the spatio-temporal (4D) dynamics of humans has been a long standing research problem in computer vision and graphics. Synthesizing photorealistic human avatars has broad applications, ranging from immersive telepresence in AR/VR and the movie industry, to enriching the education and healthcare systems. Earlier approaches relied on hand-engineered models that use a small amount of data from one or more subjects. With the advent of neural networks, training on large datasets enhanced the output visual quality. Currently, the combination of neural networks with graphics techniques has achieved natural-looking human animation. However, most approaches are identity-specific, trained only on a single identity, and use only one modality.

In this thesis, we address the problem of learning neural representations of humans in a holistic way. Given that the video data in the real world include multiple modalities (audio and video) and multiple identities, we develop multi-modal and multi-identity representations. First, we propose to reconstruct the 4D face geometry of humans by leveraging both audio and video information. In this way, the network produces accurate lip shapes and is robust to cases when either modality is insufficient. Next, we introduce a NeRF-based representation for audio-driven human face animation that achieves high-quality lip synchronization for cinematic content. Since humans communicate with their full body, combining body pose, hand gestures, and facial expressions, we extend our network to capture the full-body human motion for multiple identities simultaneously. In order to better disentangle identity and non-identity specific information, we subsequently study non-linear interactions between latent factors of variation, and propose a specific multiplicative module. In this way, we learn a multi-identity NeRF that robustly animates human faces under novel expressions and achieves a significant decrease in the total training time. Similarly, we propose a multi-identity gaussian splatting representation for human bodies, by constructing a high-order tensor. Assuming a low-rank structure, we learn a tensor decomposition that leads to a significant decrease in the total number of learnable parameters, as well as to a robust animation under novel poses. In the future, we propose to jointly synthesize audio and visual outputs from just text input. Given the recent rise of large language models, coupling text with natural-looking avatars can enhance the overall interaction between a human and an AI system.

Speaker: Aggelina Chatziagapi

Where: NCS, Room 220

Zoom link: https://stonybrook.zoom.us/j/98775312249?pwd=uORNAnSdcssrPZdqOsqaMAF5aLcRD9.1
ID: 98775312249
Passcode: 505777
Abstract: Recent studies have highlighted the vulnerability of Natural Language Processing (NLP) and Vision-Language Models (VLMs) to backdoor attacks, posing significant security risks. Understanding these attack strategies is crucial for assessing model robustness and developing effective defenses. This thesis proposal aims to investigate the vulnerability of language and vision-language models, analyze abnormal behaviors in backdoor-attacked models, and develop defense methods to enhance safety of modern machine learning models at deployment.


We investigate the internal mechanisms of backdoored NLP models, identifying a distinct attention focus drifting phenomenon, where trigger tokens hijack attention regardless of the input context. Through comprehensive qualitative and quantitative analysis, we provide insights into the underlying mechanisms that enable backdoor attacks. Building on these insights, we propose detection methods to differentiate backdoored models from clean ones, through inspecting both the attention distribution and the model predictions. To better understand the vulnerability, we develop advanced backdoor attack strategies targeting language models in classification tasks. For BERT variants, we introduce Trojan Attention Loss (TAL), a novel method that directly manipulates attention patterns to enhance backdoor effectiveness, ensuring stealth and robustness. Vision-Language Models have demonstrated strong performance in recent years. Yet their vulnerability is largely underexplored. We investigate advanced backdoor attack strategies on Vision-Language Models, focusing on image-to-text generation tasks. We demonstrate how backdoors can be embedded in complex multimodal tasks while maintaining semantic integrity under poisoned inputs. Additionally, we propose innovative techniques for injecting backdoors without requiring access to the original training data, expanding the feasibility of real-world attacks.

This proposal provides novel insights into the internal mechanisms of backdoored models, propose effective detection strategies, and develop advanced attack techniques that expose critical vulnerabilities. These findings underscore the urgent need for robust security measures to defend against emerging backdoor threats in deep learning models. The results have been published in top venues including ICLR, ECCV, NAACL, EMNLP, etc.

Speaker: Weimin Lyu


Zoom link: https://stonybrook.zoom.us/j/99880605139?pwd=cfWbRG6n9v3GXEa7OqvXa5cOp5eLBv.1
Meeting ID: 998 8060 5139
Passcode: 843302
CSE 600 Seminar Series | Fall 2025


Abstract: Vision-language models that see and describe the world are now part of our daily lives, from internet search and accessibility tools to content generation and automatic moderation. However, as these models grow and become more widely used, their limitations have also become increasingly visible. In particular, it has been shown that these models are unable to reliably perform complex tasks that require abstraction and compositional reasoning. For example, they struggle to decompose an image or text into entities, attributes, and relations, and then reason over new combinations of these elements. As a result, we see generated content full of hallucinations, privacy leaks in images, and different types of biases in the model outputs.In this talk, I will outline a research agenda that aims to build trustworthy vision-language models in the age of generative AI. I will begin with compositional reasoning: how natural language inference can be used to decompose complex instructions and captions into atomic, verifiable statements, improving both evaluation and model behavior on tasks that require multi-step reasoning. I will then discuss how synthetic data and simulated environments can be used to train more reliable models, and how they can also stress-test models beyond standard benchmarks, revealing when models drop attributes, break object relations, or fail under distribution shifts. I will also share recent work on using hallucination correction as a signal to improve video-language alignment, and on privacy-preserving image understanding for blind and low-vision users. I will conclude with possible ways we can systematically probe, debug, and repair these models, turning synthetic perception into something we can trust in real-world deployments.



Speaker: Paola Cascante-Bonilla is a tenure-track Assistant Professor in the Department of Computer Science at Stony Brook University (SUNY). Before that, she was a Postdoctoral Associate at the University of Maryland Institute for Advanced Computer Studies (UMIACS), developing methods and metrics related to trustworthy machine learning. She received her Ph.D. in Computer Science at Rice University in 2024, working on Computer Vision, Natural Language Processing, and Machine Learning.Her research focuses on developing systems that enable compositional reasoning and common-sense inference through vision and language, while tackling issues such as cultural biases, data distribution, explainability, and trustworthy AI. Additionally, Cascante-Bonilla creates simulated environments for embodied agents to learn in a safe, controlled setting, aiming to facilitate effective collaboration and problem-solving for complex tasks by leveraging the implicit knowledge of large-scale pre-trained deep learning models.
Cascante-Bonilla is the recipient of the Ken Kennedy Institute SLB Graduate Fellowship (2022/23), she was selected as a Future Faculty Fellow by Rice's George R. Brown School of Engineering (2023) and as a Rising Star in EECS (2023).
Location: NCS 120

Abstract:

Many real world complex problems are multi-step reasoning tasks. These range from analytic tasks such as answering questions to automation tasks where agents complete tasks on behalf of users.. Evaluation, datasets, and models for such tasks can be unreliable for multiple reasons. (i) Datasets often have annotation artifacts and biases, allowing models to take reasoning shortcuts. Such shortcuts can allow models to make effective guesses -- or, in a sense, cheat -- to achieve high performance without any multi-step reasoning. This issue is further exacerbated for complex tasks because as the number of the required reasoning steps increases, so do the avenues for bypassing those steps. (ii) Models trained on such dataset/s learn to solve the task by taking reasoning shortcuts instead of proper multi-step reasoning. As a result, these models are not robust (reliable) when evaluated in an out-of-distribution evaluation setting. (iii) Lastly, recent works have shown that language models can solve complex multi-step tasks by producing a step-by-step explanation without any training. However, these methods often hallucinate factually incorrect (i.e., unreliable) explanations when posed with knowledge-intensive tasks.

I address these challenges by carefully characterizing the requirements of robust multi-step reasoning and designing reliable evaluation datasets and training methods that necessitate thorough multi-step reasoning. In DiRe, I first formalize and introduce Disconnected Reasoning, i.e., reasoning that allows models to arrive at the correct answer by bypassing necessary reasoning steps, and use this formalization to measure how much multi-step reasoning a model does on a dataset. In MuSiQue, I built a multi-step reasoning dataset for QA from scratch that avoids cheatability via disconnected reasoning, providing a more reliable evaluation. In TeaBReaC, I developed a synthetically generated multi-step QA pretraining dataset designed to force models to avoid disconnected reasoning and learn reliable multi-step reasoning. In IRCoT, I address the reliability of model-generated multi-step reasoning chains by interleaving models' step-by-step reasoning with a step-by-step retrieval from an external corpus, resulting in more factually correct reasoning. Finally, in AppWorld, I built a multi-step reasoning dataset that requires highly interactive problem-solving in an environment carefully designed to ensure models need thorough reasoning to succeed.
Speaker: Harsh Trivedi

Location: NCS 220 or Zoom

https://stonybrook.zoom.us/j/99096379762?pwd=zYCJZQVxRuZd9BboscO4nlodCwsKBr.1
CSE 600 Seminar Series | Fall 2025


Abstract: Large reasoning models have demonstrated capabilities to solve competition-level math problems, answer deep research questions, and address complex coding needs. Much of this progress has been enabled by scaling of data: pre-training data to learn vast knowledge, fine-tuning data to learn natural language reasoning, and RL environments to refine that reasoning. In this talk, I will describe the current LLM reasoning paradigm, its boundaries, and the future of LLM reasoning beyond scaling. First, I will describe the state of reasoning models and where I think scaling can lead to some additional (though perhaps limited) successes. I will then shift to discussing more fundamental issues with models that scale will not resolve in the next few years. I will touch on four current limitations: outdated knowledge, generator-validator gaps, limited creativity, and poor compositional generalization. In all cases, fundamental limitations of LLMs or of supervised learning in general make these problems challenging, inviting future study and novel solutions beyond scaling.

Bio: Greg Durrett is an associate professor in the Department of Computer Science and the Center for Data Science at New York University. His research is broadly in the areas of natural language processing and machine learning. Currently, his group's focus is on reasoning about knowledge in text, verifying correctness of generation methods, and studying how to make progress on problems that defy LLM scaling. He is a 2023 Sloan Research Fellow and a recipient of a 2022 NSF CAREER award. He has served in numerous roles for ACL conferences, recently as a member of the NAACL Board since 2024 and as Senior Area Chair for ACL 2025 and EMNLP 2025. He received his BS in Computer Science and Mathematics from MIT and his PhD in Computer Science from UC Berkeley, where he was advised by Dan Klein.
CSE 600 Talk: Squeezing Software Performance via Eliminating Wasteful Operations presented by Xu Liu

ABSTRACT: Inefficiencies abound in complex, layered software. A variety of inefficiencies show up as wasteful memory operations, such as redundant or useless memory loads and stores. Aliasing, limited optimization scopes, and insensitivity to input and execution contexts act as severe deterrents to static program analysis. Microscopic observation of whole executions at instruction- and operand-level granularity breaks down abstractions and helps recognize redundancies that masquerade in complex programs. In this talk, I will describe various wasteful memory operations, which pervasively exist in modern
software packages and expose great potential for optimization. I will discuss the design of a fine-grained instrumentation-based profiling framework that identifies wasteful operations in their contexts, which guides nontrivial performance improvement. Furthermore, I will show our recent improvement to the profiling framework by abandoning
instrumentation, which reduces the runtime overhead from 10x to 3% on average. I will show how our approach works for native binaries and various managed languages such as Java, yielding new performance insights for optimization.

BIO: Xu Liu is an assistant professor in the Department of Computer Science at College of William & Mary. He obtained his PhD from Rice University in 2014 and joined the College of William & Mary in the same year. Prof. Liu works on building performance tools to pinpoint and optimize inefficiencies in HPC code bases. He has developed several open-source profiling tools, which are used worldwide at universities, DOE national laboratories and industrial companies. Prof. Liu has published a number of papers in high-quality venues. His papers received Best Paper Award at SC'15, PPoPP'18, PPoPP'19 and ASPLOS'17 Highlights, as well as Distinguished Paper Award at ICSE'19. His recent ASPLOS'18 paper has been selected as ACM SIGPLAN Research Highlights in 2019 and nominated for CACM Research Highlights. Prof. Liu is the receipt of 2019 IEEE TCHPC Early Career Researchers Award for Excellence in High Performance Computing. Prof. Liu served on the program committee of conferences such as SC, PPoPP, IPDPS, CGO, HPCA and ASPLOS.
Abstract: Formalization of mathematics is the process by which pen-and-paper mathematics is translated into a strict chain of logical deductions down to the axioms of mathematics. The subject has seen renewed interest in the last decades thanks to the development of computer systems called proof assistants, which make this feasible in practice.
There have now been several examples of high-profile mathematical results which have been formalized. In principle, any mathematical domain is accessible. However, existing projects are skewed towards algebra instead of analysis. Notable exceptions are a project which formalized enough of Gromov's convex integration theory to deduce Smale's sphere eversion theorem and the ongoing project to formalize Carleson's convergence theorem for Fourier series.
This workshop will bring together formalization experts and interested mathematicians to give a new impulse to formalization of analysis (in a very broad sense), and to develop abstractions and tools to deduplicate effort.

Application Information: ICERM welcomes applications from faculty, postdocs, graduate students, industry scientists, and other researchers who wish to participate. Some funding may be available for travel and lodging. Graduate students who apply must have their advisor submit a statement of support in order to be considered.

The deadline to apply for this workshop is January 24, 2026.

https://icerm.brown.edu/program/topical_workshop/tw-26-ttfa