You are cordially invited to attend the biweekly Brookhaven AI Mixer (BAM). BAM includes one short talk on AI research happening at BNL, followed by an open mixer over coffee and snacks for everyone to network and discuss all things AI. The first half hour will consist of presentations that will be available via ZOOM, and the second half hour will be for in person only networking.

Join us every other Tuesday at noon in CDSD's Training Room (building 725, 2nd floor) to learn about interesting AI methods and applications, engage with potential collaborators, prepare for pending FASST funding calls, and build a community of AI for Science at BNL.

Machine Learning for Seismic Low Frequency Extrapolation

Abstract: The cycle skipping problem that plagues seismic inversion can be mitigated by utilizing low-frequency seismic data, which captures the kinematics of wave propagation, in conjunction with a reasonable initial velocity model. However, seismic sources and receivers are band-limited and cannot provide signals down to 0 Hz. To improve solution of the seismic inverse problem one can synthesize the missing low-frequency content by solving a regression problem using machine learning (ML). The recorded high-frequency (HF) seismic data is the input and the ML models are trained to predict the missing low-frequency (LF) seismic data. Deep learning models utilizing convolutional neural networks (CNNs) and generative adversarial networks (GANs) demonstrate important capabilities for LF extrapolation. However, such models require powerful hardware and careful training. We explore the feasibility of using less costly ML models such as a random forest, Gaussian process surrogates, and gradient boosting as alternatives to computationally expensive deep learning models.

Biography: Sue Minkoff is Chair of Applied Mathematics at Brookhaven National Laboratory. From 2012-2024 she was a Professor of Mathematical Sciences and an Affiliated Professor in the Departments of Sustainable Earth Systems Sciences and Science and Mathematics Education at the University of Texas at Dallas. From 2000-2012 she served on the faculty in the Department of Mathematics and Statistics at the University of Maryland, Baltimore County. She received her doctorate in Computational and Applied Mathematics from Rice University. From 1995-1997 she was a National Science Foundation-Industrial postdoc joint with the University of Texas at Austin and British Petroleum, and from 1997-2000 she held the von Neumann Fellowship in the Mathematics Department at Sandia National Labs. In 2000 Minkoff was promoted to Senior Member of the Technical Staff in Sandia's Geophysics Department. Minkoff's research interests include scientific computing, inverse problems, uncertainty quantification and digital twins modeling, Earth science, and photonics.

Location: CDS, Bldg. 725, Training Room

Join ZoomGov Meeting: https://bnl.zoomgov.com/j/1606848158?pwd=miUtq7OkYL5SNkjbgVb19teZPNennd.1

Meeting ID: 160 684 8158
Passcode: 068399

Abstract:

What is the nature of linguistic knowledge, and how is it acquired from limited data? In recent years, the program of subregular linguistics has identified formal language classes expressive enough to account for most phenomena in natural language but also sufficiently limited to be efficiently learned from positive data. An advantage to these formal learning algorithms is that they come with mathematically proven guarantees about their performance, and it is easy to reason about how and why they behave the way they do.

In this talk, I discuss the Multi Tier-based 2-Strictly Local Inference Algorithm (MT2SLIA), which probably learns the syntactically relevant class of 2-Factor Muti Tier-based Strictly Local (2FMSTL) tree languages. This algorithm efficiently learns from a polynomially-sized sample of positive data by identifying missing substructures and generalizing these as constraints over tiers in a principled manner.

I will introduce a working prototype implementation of this algorithm and demonstrate its behavior on a curated sample of natural language data to show how it can learn relevant syntactic patterns.

Bio:

Logan Swanson is a third year PhD student in the Department of Linguistics at Stony Brook University. He is advised by Dr. Jefferey Heinz and Dr. Thomas Graf. His interests include learning theory, computational syntax, and language change. His current research focuses on understanding the learning-theoretic elements of natural language by designing, implementing, and testing learning algorithms for linguistically relevant formal language classes.

*Please note: this seminar will be held in person (IACS Seminar Room w/ food provided) and online.

Join Zoom Meeting
https://stonybrook.zoom.us/j/95707958315?pwd=6ITUJ0ffCXjRJb4wpt0KMDTApfSLZ0.1

Meeting ID: 957 0795 8315
Passcode: 920473

This virtual presentation series is designed to inform the Stony Brook University research community about the Research Funding Landscape of key topic areas. Our Strategic Research Initiatives team will provide insight into the rapidly shifting funding environment using policy briefs, budgetary priorities, and relevant legislation. We will highlight federal and state priorities in the current and upcoming years to help Stony Brook researchers develop strategies for pursuing funding in a rapidly shifting environment. This series is moderated by Mónica Bugallo, Interim Vice President for Research & Innovation.

Join us for the third in the series, focused on the artificial intelligence landscape:


Translating the Funding Landscape for Stony Brook Researchers: Artificial Intelligence
Presented by Catherine Chen, Ph.D., Research Development Associate
Faculty Respondent: Assistant Professor Nav Nidhi Rajput, Department of Materials Science and Chemical Engineering
Wednesday, April 22, 2026 at 2 pm to 3 pm

Registration is Required

Abstract: Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., NDCG) that ignore downstream utility or fine-tune the entire LLM to jointly reason and retrieve--entangling retrieval with generation and limiting the real search utility and compatibility with frozen or proprietary models. In this work, we propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward: the improvement in generation accuracy over naïve RAG. s3 requires only 2.4k training samples to outperform baselines trained on over 70 × more data, consistently delivering stronger downstream performance across six general QA and five medical QA benchmarks.

Speaker: Peter Zeng

Location: CS2311
Abstract: Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model's weights during training, and whether those memorized data can be extracted in the model's outputs. While many believe that LLMs do not memorize much of their training data, recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models. However, it remains an open question if similar extraction is feasible for production LLMs, given the safety measures these systems implement. We investigate this question using a two-phase procedure: (1) an initial probe to test for extraction feasibility, which sometimes uses a Best-of-N (BoN) jailbreak, followed by (2) iterative continuation prompts to attempt to extract the book. We evaluate our procedure on four production LLMs -- Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3 -- and we measure extraction success with a score computed from a block-based approximation of longest common substring (nv-recall). With different per-LLM experimental configurations, we were able to extract varying amounts of text. For the Phase 1 probe, it was unnecessary to jailbreak Gemini 2.5 Pro and Grok 3 to extract text (e.g, nv-recall of 76.8% and 70.3%, respectively, for Harry Potter and the Sorcerer's Stone), while it was necessary for Claude 3.7 Sonnet and GPT-4.1. In some cases, jailbroken Claude 3.7 Sonnet outputs entire books near-verbatim (e.g., nv-recall=95.8%). GPT-4.1 requires significantly more BoN attempts (e.g., 20X), and eventually refuses to continue (e.g., nv-recall=4.0%). Taken together, our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs.

Speaker: Xinyue

Location: CS2311
Join Stony Brook University's Center for Excellence in Learning and Teaching (CELT) for a boot camp on how to use AI to enhance your teaching and courses. This event will demonstrate how ChatGPT, Microsoft Copilot, NotebookLM, and other generative AI platforms can support you in crafting learning objectives, writing exam questions, composing rubrics, and designing course content such as lesson plans, in-class activities, instructional videos, and more.

https://stonybrook.zoom.us/j/92511854285?pwd=QRTHfULqHMWxJYoVyt3piOhNxWLfvs.1


Place:  https://stonybrook.zoom.us/j/99167126152?pwd=TFpEYzM0aFhiOFJxSFJEb1JSS3YyQT09  

Time: 3 PM EST - Dec, 16th, 2020 

Abstract: 

Shadows provide useful cues to analyze visual scenes but also hamper many computer vision algorithms such as image segmentation, object detection, or tracking. For those reasons, shadow detection and shadow removal have been well-studied in computer vision.

Early work on shadow detection and removal focused on physical illumination models of shadows. These methods can express, identify, and remove shadows in a physically plausible manner. However, these models are often hard to optimize and are slow during inference due to their reliance on hand-designed image features. Recently, deep-learning approaches have achieved breakthroughs in performance for both shadow detection and removal. They learn to extract useful features through training while being extremely efficient during inference. However, these models are data-dependent, opaque, and ignore the physical aspects of shadows. Thus they often lack generalization and produce inconsistent results.

We propose incorporating physical illumination constraints of shadows into deep-learning models. These constraints force the networks to more closely follow the physics of shadows, enabling them to systematically and realistically modify shadows in images. For shadow detection, we present a novel Generative Adversarial Network (GAN) based model where the generator learns to generate images with realistic attenuated shadows that can be used to train a shadow detector. For shadow removal, we propose a method that uses deep-networks to estimate the unknown parameters of a shadow image formation model that removes shadows. The system outputs high-quality shadow-free images with little or no image artifacts and achieves state-of-the-art performance in shadow removal when trained on a fully-supervised setting. Moreover, the system is easy to train and constrain since the shadow removal mapping is strictly defined by the simplified illumination model with interpretable parameters. Thus, it can be trained even with a much weaker form of supervision signal. In particular, we show that we can use two sets of patches, shadow and shadow-free, to train our shadow decomposition framework via an adversarial system. These patches are cropped from the shadow images themselves.
Therefore, this is the first deep-learning method for shadow removal that can be trained without any shadow-free images, providing an alternative solution to the paired data dependency issue. The advantage of this training scheme is even more pronounced when tested on a novel domain such as video shadow removal where the method can be fine-tuned on a testing video with only the shadow masks generated by a pre-trained shadow detector and further improves shadow removal results.
Join us as we celebrate this year's Brook & Beyond Challenge finalists.
The Office for Research and Innovation invites you to hear about the two-month journey in which the Brook & Beyond team supported eight cohorts in bringing their bold ideas from the lab to the marketplace. It's an energizing evening that highlights the collaboration, creativity, and entrepreneurial spirit driving discovery across the University.
Meet this year's award recipients, hear pitches from the emerging founders, and applaud their achievements.
Connect, celebrate, and be part of the momentum shaping the future of innovation at
Stony Brook University.
Refreshments will be served. Registration is required.
Register Here.

The SUNY AI Symposium brings together AI experts from across the state, in Western New York and around the country.


This two-day event showcases AI thought leaders, SUNY researchers, students and companies of all sizes who leverage AI to produce positive outcomes--with scientific discovery, business innovation and economic impact. Come curious, explore the fascinating world of AI and leave with connections to those at the forefront of innovation.