How to Do Spectral Learning at Scale for Science and Engineering

Abstract: Spectral decompositions such as singular value decompositions (SVDs) and eigenvalue decompositions (EVDs) are central tools across a vast swath of scientific computing and machine learning, with abundant engineering applications. Yet many modern methods for learning such decompositions in high dimensions struggle with instability, bias, and poor scalability, even when approximation power is not the limiting factor. I argue that these difficulties are not intrinsic to spectral problems, but instead arise from a shared reliance on Rayleigh-quotient-based constrained optimization, which forces explicit orthogonality handling through penalties, normalization, or whitening.
To address these challenges, I present a reformulation based on unconstrained variational objectives that implicitly encode spectral structure, eliminating the need for orthogonalization and ad-hoc regularization. This perspective leads to a conceptually simpler and scalable parametric framework for learning ordered spectral representations via nested optimization. The resulting framework is well matched to diverse settings in science and engineering. As examples, I demonstrate its effectiveness on eigenvalue problems for linear PDEs such as the Schrödinger equation, spectral (Koopman) analysis of nonlinear dynamical systems such as molecular dynamics, and structured representation learning with deep neural nets. Collectively, these examples illustrate how abandoning Rayleigh-quotient-based formulations resolves long-standing optimization pathologies across domains.

Bio: Jongha (Jon) Ryu is a postdoctoral associate at MIT EECS. He received his Ph.D. in Electrical and Computer Engineering from UC San Diego. His research develops statistical and mathematical foundations for scientific machine learning, with a focus on scalable spectral methods, efficient generative modeling, and reliable uncertainty quantification for scientific and engineering systems.

Location: NCS 120
Abstract: Autonomous systems, whether on Earth or in space, rely on 3D perception to understand and interact with the world around them. Yet traditional techniques for 3D understanding often depend on human designed features, fixed sensors, and conventional imaging modalities. This constrained approach can limit every stage of perception, from sensing to interpretation to decision making.
In this talk, we'll explore an alternative paradigm for imaging: physically based neural representations for 3D scenes and 3D sensing systems. We will discuss how recent advances in large scale learned representations can be used to jointly optimize both 3D scene models and the design of sensing systems for 3D capture, with the goal of enabling task specific perception systems.
Unlike modern AI models trained on internet scale datasets, these specialized 3D representations typically operate in data sparse regimes and therefore require a different kind of prior. We'll examine how grounding these learned representations in the physics of light transport can improve our understanding of scene structure, and inform imaging system design even with limited data. By connecting physical insights with learned representations, we'll highlight new possibilities for robust, efficient, and adaptive perception in challenging environments.

Speaker: Nikhil Behari is a graduate student in the Camera Culture group at the MIT Media Lab, advised by Professor Ramesh Raskar. His research interests include computational imaging, 3D scene understanding, and multi-agent decision-making under uncertainty, with a focus on automating imaging system design for 3D perception in human and planetary health. His research is supported by the NASA Space Technology Graduate Research Fellowship. He received his bachelor's in Computer Science and Statistics from Harvard University in 2022.










Abstract:
Quantifying similarity is a central notion in science and data analysis, pervading everything from phylogenetic trees to the foundation of clustering. Unfortunately, despite being examined and applied for decades, traditional similarity and distance metrics have fundamental drawbacks. The key problem is that all of them are only defined over pairs of objects, so they scale quadratically when one tries to compare N objects. The present explosion in the amount of data available to us requires new ways to process information, and while some current algorithms can handle millions of points, we need alternatives applicable to billions. This is what motivated us to develop a new framework that can compare any number of objects at the same time. With this, we achieve an unprecedented linear scaling when comparing multiple objects. Here we will discuss the main properties of this formalism, along with its applications in drug design and to the analysis of Molecular Dynamics (MD) simulations. Our indices have proven to be incredibly versatile when applied to chemical space exploration and visualization, allowing us to rigorously quantify the chemical diversity of very large molecular libraries. This has led to the creation of several algorithms to sample important regions in chemical space, including a more efficient way of identifying the prevalence of activity cliffs. Additionally, our indices provide a convenient route to sample complex MD trajectories, allowing to identify representative structures very efficiently. Moreover, we can also cluster biological ensembles in a more robust way than with standard algorithms, which has led to our group's work on MDANCE, a very flexible and efficient open-source clustering module. Drop by if you want to know how we clustered one billion molecules!


Speaker:
Assistant Professor, Department of Chemistry and Quantum Theory Project
University of Florida, Gainesville
Website: https://quintana.chem.ufl.edu/

Location:
Laufer Center Lecture Hall 101

Abstract:

Conventional approaches to scientific discovery often prioritize building larger sensors, gathering more data, and scaling up computational power. In this talk, I will present a complementary perspective: extracting insights hidden in the data we already have. The key lies in using AI not as a black-box predictor, but as a tool for interpreting data through its underlying physical process.

I will demonstrate how AI, when integrated with the physics of light propagation, can serve as a computational lens to overcome fundamental limitations in fields ranging from biomedicine to astrophysics. Specifically, I will showcase two compelling applications: non-invasive imaging through scattering biological tissues, and detecting faint exoplanets against the overwhelming brightness of their host stars.

These methods represent a departure from traditional learning-based approaches that rely on fitting models to training labels and hoping for generalization. Instead, with physics-informed strategies that decode how light propagates, we can transform raw measurements into scientifically meaningful insights--without requiring costly hardware upgrades or human-annotated datasets. Finally, I will outline future directions for combining AI with physical principles, enabling us to unlock more phenomena once considered hidden and accelerating discoveries in healthcare, astronomy, and beyond.

Short Bio:

Brandon Y. Feng is a Postdoctoral Associate at MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and a Visiting Scientist at the Harvard-Smithsonian Center for Astrophysics. His research bridges artificial intelligence and physics to expand the limits of human and machine vision. He develops AI-driven methods that reveal hidden patterns in complex visual data, driving breakthroughs in areas such as exoplanet detection and imaging through scattering tissues. His work has been published in top venues, including Science Advances, CVPR, ICCV, ECCV, and NeurIPS, and has been featured in Science.org, New Scientist, and Phys.org. He holds a Ph.D. in Computer Science from the University of Maryland, along with a B.A. in Computer Science and Statistics and an M.S. in Statistics from the University of Virginia.

Location: NCS 220
Climate Uncertainty, Decision Making, and AI for Earth System Predictability Dr. Nathan Urban, Brookhaven National Laboratory

Bio: Nathan Urban is the group leader of the Optimal Experimental Design & Uncertainty Quantification group in the Applied Mathematics Department at Brookhaven National Laboratory's Computing & Data Sciences directorate (CDS). He holds a Ph.D. in theoretical condensed matter physics from Penn State, and has previously held research positions at Los Alamos National Laboratory, Princeton, and Penn State. His research interests include Bayesian inference and spatiotemporal statistics, probabilistic prediction and forecasting, multi-model / model-form / model structural uncertainty quantification, reduced order modeling, scientific machine learning and hybrid physical-data driven modeling, in-situ/streaming data analysis at scale, information fusion, decision making under uncertainty and optimal experimental design, and integrated multiscale computational frameworks for decision support.

Location: IACS Seminar Room

Lunch will be provided
The Future Histories Studio will host Young Maeng, an artist and professor at California State University, Fresno, for a talk exploring the intersection of artificial intelligence (AI) and traditional painting, examining how two seemingly disparate fields can converge to create new artistic expressions.

The lecture is part of the Future History Studio series at Stony Brook University, a platform dedicated to examining the evolving relationship between technology, art, and society.

Young will discuss her innovative approach to expanded painting, an integration of AI-generated images and traditional techniques such as Korean ink and acrylic painting. Through this fusion, she visualizes complex philosophical and ethical questions about the coexistence of humans, nature, and AI companion robots. The lecture will highlight the broader implications of AI in the art world, touching on how AI technologies challenge conventional notions of creativity and human-centric perspectives in art.

Speaker Bio:

Young Maeng is an artist and professor at California State University, Fresno, whose work explores the intersection of artificial intelligence (AI) and traditional painting techniques such as Korean ink and acrylic.

Maeng's innovative approach to expanded painting blends AI technology with traditional methods to visualize complex philosophical and ethical questions surrounding the coexistence of humans, nature, and AI companion robots.

Location: Future Histories Studio
Register here: https://www.eventbrite.ca/e/ai-and-painting-tickets-1021050809457?aff=oddtdtcreator

Chat with Sociology faculty as they share their paths to StonyBrook-what inspired their careers, what led them to teaching,and the experiences that shaped their academic journey.

Dr. Yongjun Zhang

Assistant Professor of Sociology, Departments of Sociology and AAAS

Join this opportunity to talk to Yongjun Zhang about his new interest in the following responsible usage of AI in addressing climate and health issues. Lunch will be served.

Location: SBS Level 4- Sociology Reading Room

View more event information


The next AI Institute seminar speaker will be Chao Chen of Biomedical Informatics, on Monday November 29 at noon via zoom:

https://stonybrook.zoom.us/j/96233844681?pwd=aVVsUnIzMWJDMHRqVXcrQU5HMjFVQT09

He will be talking on the Detection of Trojan Attacks to Deep Neural Networks - A Topological Perspective, with his abstract and bio below.


Abstract: Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model's behavior through Trojaned training samples, i.e., samples with special trigger injected and labels altered. To identify a Trojaned model at deployment is challenging, due to limited access to the training data. We propose to identify Trojaned neural networks using methods from topological data analysis. In particular, we propose to (1) inspect high-order topological features of the neuron interactions and (2) reverse engineer the injected triggers using a topological loss. These approaches take different angles and reveal insights into the behavior of neural networks when their strong memorialization power is exploited maliciously. The work has been accepted to NeurIPS'21. I will also briefly mention other research directions from my group, including incorporating topological information into deep image analysis, topology-inspired graph neural networks, and robust training of neural networks with label noise. These works have been published in ICLR, ICML, NeurIPS, ECCV, ICCV and AAAI in recent years.
Bio: Dr. Chao Chen is an assistant professor of Biomedical Informatics at Stony Brook University. His research interests span topological data analysis (TDA), machine learning and biomedical image analysis. He develops principled learning methods inspired by the theory from TDA, such as persistent homology and discrete Morse theory. These methods address problems in biomedical image analysis, robust machine learning, and graph neural networks from a unique topological view. His research results have been published in major machine learning, computer vision, and medical image analysis conferences. He is serving as an area chair for MICCAI, AAAI, CVPR and NeurIPS.

Learn how to prompt AI to help clean datasets and write formulas in Google Sheets.

When you have a messy dataset, it can take a lot of time to clean it up before you can start analyzing. Can AI help? In this workshop, we'll collect live data and then use Gemini AI (the stand alone tool) to help clean up the data. Then, we'll use it to help do some analysis. Because we'll be working with live data live in Gemini, we don't know exactly what will happen, but that's the reality of data and data cleaning!

In this session, you will

  1. Craft effective AI prompts to generate Google Sheets formulas for data analysis and manipulation
  2. Utilize Gemini to develop regular expression formulas to extract, reformat, clean text-based data
  3. Develop formulas for numerical analysis using Gemini AI

https://stonybrookuniversity.co1.qualtrics.com/jfe/form/SV_dht1o3rNzlZhHka?source=event+manager&session=0815250900sheets