CSE 600 Seminar Series | Fall 2025

Abstract: Imagine machines that can see the invisible: drones locating wildfire survivors, cameras predicting building failures, and smartphones detecting skin tumors. These applications lie beyond today's vision systems, which focus only on human-visible information. In this talk, I argue that a wealth of scene information is hidden in light properties invisible to the human eye, such as the travel time of photons and polarization of light waves. I will present how co- designing camera hardware, graphics models, and learning algorithms unlocks these invisible properties to create superhuman vision systems. I will present three superhuman vision capabilities: seeing around blind corners, turning objects into cameras, and extracting internal stress fields. By analyzing faint light reflections on diffuse walls and shiny objects, we create virtual cameras that reveal scenes hidden from the line of sight - enabling autonomous systems to navigate safely. Using the polarization of light, we recover mechanical stress fields hidden inside objects - opening new possibilities for non-destructive material characterization. These capabilities point toward a future where machines can see the invisible: around us, beneath our bodies, and beyond our scientific understanding.

Bio:
Akshat Dave is an Assistant Professor in the Department of Computer Science at Stony Brook
University, USA. His research lies at the intersection of applied optics, computer vision, and
machine learning. His work has been recognized by Rice University's Best Thesis Award, Optica Best Paper Prize, SIGGRAPH Asia Doctoral Consortium, and fellowships by Qualcomm, Texas Instruments, and INK Global Foundation. Prior to Stony Brook, he was a Postdoctoral Associate at MIT Media Lab. He holds a Ph.D. from Rice University and a Masters and a Bachelors from Indian Institute of Technology Madras.

Reception to follow.

Abstract:
In this talk, I will present our journey of developing diverse, adaptive, uncertainty-calibrated AI planning agents that can robustly communicate and collaborate for multi-agent reasoning (on math, commonsense, coding, etc.) as well as for interpretable, controllable multimodal generation (across text, images, videos, audio, layouts, etc.). In the first part, we will discuss improving reasoning via multi-agent discussion among diverse LLMs and structured distillation of these discussion graphs (ReConcile, MAGDi), adaptively learning to balance abstraction, decomposition, refinement, and fast+slow thinking in LLM-agent reasoning (ReGAL, ADaPT, MAgICoRe, System-1.x), as well as confidence calibration in LLMs via speaker-listener pragmatic reasoning and making LLMs better teammates via multi-agent positive-negative persuasion balancing (LACIE, PBT). In the second part, we will discuss interpretable and control-lable multimodal generation via LLM-agents based planning and programming, such as layout-controllable image generation (and evaluation) via visual programming (VPGen+VPEval), consistent multi-scene video generation via LLM-guided planning (VideoDirectorGPT), interactive and composable any-to-any multimodal generation (CoDi, CoDi-2), as well as feedback-driven multi-agent interaction for adaptive environment/data generation via weakness discovery (EnvGen, DataEnvGym).
Bio:
Dr. Mohit Bansal is the John R. & Louise S. Parker Distinguished Professor and the Director of the MURGe-Lab (UNC-NLP Group) in the Computer Science department at UNC Chapel Hill. He received his PhD from UC Berkeley in 2013 and his BTech from IIT Kanpur in 2008. His research expertise is in natural language processing and multimodal machine learning, with a particular focus on multimodal generative models, grounded and embodied semantics, faithful language generation, and interpretable, efficient, and generalizable deep learning.
CSE 600 Seminar Series | Fall 2025


Abstract: The first part of the presentation focuses on the fundamental role that failures play in the Ph.D. journey, highlighting how they offer invaluable learning experiences to build resilience, critical thinking, and adaptability. Instead of viewing failures as signs of inadequacy, they should be recognized as opportunities to learn, re-evaluate, and develop the persistence needed for success in a high-stakes research environment. In the second part of the presentation, we take a quick look at the evolution of distributed databases research at Stony Brook and then focus on different challenges associated with distributed transaction processing systems functioning in untrustworthy environments. Byzantine Fault-Tolerant (BFT) protocols have recently been extensively used by distributed transaction processing systems to establish consensus on the order of transactions. However, the proliferation of different BFT protocols has made it difficult to navigate the BFT landscape, let alone determine the protocol that best meets application needs. Moreover, as novel applications, modern hardware, and new cloud platforms arise, distributed transaction processing systems need to be designed with full-stack adaptivity in mind. This presentation discusses our vision for a reinforcement learning (RL)-based distributed transaction processing system that adjusts effectively in real time to dynamic fault scenarios and evolving workloads.

Bio: Mohammad Javad Amiri is an Assistant Professor in the Department of Computer Science at Stony Brook University. Before joining Stony Brook, he was a postdoctoral researcher in the Computer and Information Science Department at the University of Pennsylvania. He received his Ph.D. in Computer Science from the University of California, Santa Barbara. His research mainly lies at the intersection of data management and distributed systems, focusing on distributed transaction processing, consensus protocols, and blockchains.
Title: Class visual similarity based noisy sample removal in generative Few Shot Learning
Time: Thursday, Feb 4, 11:30am - 1:00pm
Zoom:
https://stonybrook.zoom.us/j/8563646526?pwd=anJna1gzUStXNlNVSUIzdDRUSC9CUT09

Meeting ID: 856 364 6526
Passcode: 203791



Abstract:  

Over the past decade, larger datasets, hardware accelerations, and network architecture improvements have contributed to phenomenal achievements in many tasks of computer

vision. However, in the absence of large datasets, computer vision models struggle to learn

general representations which results in poor performance. Few-shot learning tries to address 

this problem by proposing models which learn from a few examples.


I first give an overall review of few-shot learning methods. I particularly focus on generative Few Shot Learning(FSL) methods, which augment the scarce categories in a dataset by generating samples for those rare categories. As the actual class distribution can be complex and lie very close to each other, the sample generated for one class can be noisy or lie close to another class.  However, none of the current FS generative methods perform any form of quality control of the generated samples.


In this work, I propose to identify and remove the generated samples that are less likely to be in the distribution of the few-shot class. Here I particularly deal with few-shot scenarios where the

prior information of the relationship between the classes based on visual  similarity is available. The main idea is to exploit these priors to better identify the unreliable generated samples.


Particularly, I have proposed two methods based on class relationship to detect noisy generated samples. In the first method, we assume that the embedding space of each class follows a Gaussian distribution.  From this assumption, I propose Gaussian Neighborhood (GN), a method to estimate how likely a generated sample is drawn from the estimated distribution of a few-shot class.  We evaluate this method on the Hematopoiesis dataset.  By simply eliminating samples based on thresholding our proposed GN scores, the few-shot  classification  performance  is  improved by 5% and 2% in five shot and one shot respectively, compared to the model trained on all generated images. 


The GN scores represent the similarity distances from the generated samples to their classes, based on the assumption that each class is a Gaussian distribution.  However, this assumption might be strict in many scenarios since the real distributions of data can be arbitrarily complex.  Thus in my second proposed method, I aim to learn such similarity distances directly from data via metric learning. I propose to train a deep-network to regress the similarity distance between a pair of samples.  This network is trained using both the class-level  visual  similarity  information  and  the  class  labels.   This method improves the 1-shot and 5-shot classification performances by 0.5% and 1% respectively, compared to GN.

Abstract: Recent studies have highlighted the vulnerability of Natural Language Processing (NLP) and Vision-Language Models (VLMs) to backdoor attacks, posing significant security risks. Understanding these attack strategies is crucial for assessing model robustness and developing effective defenses. This thesis proposal aims to investigate the vulnerability of language and vision-language models, analyze abnormal behaviors in backdoor-attacked models, and develop defense methods to enhance safety of modern machine learning models at deployment.


We investigate the internal mechanisms of backdoored NLP models, identifying a distinct attention focus drifting phenomenon, where trigger tokens hijack attention regardless of the input context. Through comprehensive qualitative and quantitative analysis, we provide insights into the underlying mechanisms that enable backdoor attacks. Building on these insights, we propose detection methods to differentiate backdoored models from clean ones, through inspecting both the attention distribution and the model predictions. To better understand the vulnerability, we develop advanced backdoor attack strategies targeting language models in classification tasks. For BERT variants, we introduce Trojan Attention Loss (TAL), a novel method that directly manipulates attention patterns to enhance backdoor effectiveness, ensuring stealth and robustness. Vision-Language Models have demonstrated strong performance in recent years. Yet their vulnerability is largely underexplored. We investigate advanced backdoor attack strategies on Vision-Language Models, focusing on image-to-text generation tasks. We demonstrate how backdoors can be embedded in complex multimodal tasks while maintaining semantic integrity under poisoned inputs. Additionally, we propose innovative techniques for injecting backdoors without requiring access to the original training data, expanding the feasibility of real-world attacks.

This proposal provides novel insights into the internal mechanisms of backdoored models, propose effective detection strategies, and develop advanced attack techniques that expose critical vulnerabilities. These findings underscore the urgent need for robust security measures to defend against emerging backdoor threats in deep learning models. The results have been published in top venues including ICLR, ECCV, NAACL, EMNLP, etc.

Speaker: Weimin Lyu


Zoom link: https://stonybrook.zoom.us/j/99880605139?pwd=cfWbRG6n9v3GXEa7OqvXa5cOp5eLBv.1
Meeting ID: 998 8060 5139
Passcode: 843302
As artificial intelligence continues to transform higher education and the world beyond, how are students engaging with this change? Join us for a student-led discussion that explores how AI is influencing academic integrity, learning practices, and students' perspectives on its role in future workplaces.

Our panelists will share their experiences and reflections on questions such as:
1. What counts as appropriate and inappropriate use of AI in coursework?
2. How do faculty approach AI and talk about its implications in class?
3. What does AI mean for students' learning and ethical decision-making?
4. How are students building their understanding of AI tools and their potential uses in professional contexts?

This conversation offers an authentic look at how students are navigating the promises and challenges of AI--both in their studies and as they look ahead to applying these technologies responsibly in their fields.

Register here.
ICB&DD 19th Annual Symposium

Iwao Ojima, Director, ICB&DD
Ivet Bahar Chair, Organizing Committee
Dima KozakovCo-Chair, OrganizingCommittee

There will be poster sessions on projects conducted in the ICB&DD member's laboratories aswell as other laboratories in the area. Awards will be given to the best three posters.

Please see the link for the registration and poster sessions in:
https://www.stonybrook.edu/commcms/icbdd/https://forms.gle/Wh4UzVx9U4HWStXb8