AI Innovation Institute

Optimization and Machine Learning - presented by Yifan Sun

Abstract: Optimization is a growing topic of interest in the machine learning community. It starts out as an option to check in Tensorflow (SGD? Adam? Adagrad?), but as we get more into the how and why of these options, we uncover many fundamental principles relating to operations research, control theory, and dynamical systems, dating back as far as the Cold World era.

In this talk I will give a broad overview of some of the important optimization themes in machine learning. I will try to give connections between tools we are used to seeing in popular packages
and fundamental optimization concepts like duality, convexity, contractive operators, etc. While we cannot hope to completely cover this diverse research area, I hope to provide a glimpse of this exciting research area that is permeating more and more into the machine learning world.

Bio: Yifan Sun received her PhD in Electrical Engineering from the University of California Los Angeles in 2015, with research focusing on convex optimization and semidefinite programming. She was then Technicolor Research and Innovation, focusing on machine learning and
data science applications. More recently, she completed two postdocs focusing on optimization, at the University of British Columbia in Vancouver, Canada and INRIA, in Paris, France.

Read more about AI Institute Seminar: Yifan Sun - Optimization and Machine Learning

Abstract: The faster AI automation spreads through the economy, the more profound its potential impacts, both positive (improved productivity) and negative (worker displacement). The previous literature on AI Exposure cannot predict this pace of automation since it attempts to measure an overall potential for AI to affect an area, not the technical feasibility and economic attractiveness of building such systems. In this work, we present a new type of AI task automation model that is end-to-end, estimating: the level of technical performance needed to do a task, the characteristics of an AI system capable of that performance, and the economic choice of whether to build and deploy such a system. The result is a first estimate of which tasks are technically feasible and economically attractive to automate - and which are not. We focus on computer vision, where cost modeling is more developed. We find that at today's costs U.S. businesses would choose not to automate most vision tasks that have AI Exposure, and that only 23% of worker wages being paid for vision tasks would be attractive to automate. This slower roll-out of AI can be accelerated if costs fall rapidly or if it is deployed via AI-as-a-service platforms that have greater scale than individual firms, both of which we quantify. Overall, our findings suggest that AI job displacement will be substantial, but also gradual - and therefore there is room for policy and retraining to mitigate unemployment impacts.

Details of this work can be found here.

Speaker Bio: Neil Thompson is the Director of the FutureTech research project at MIT's Computer Science and Artificial Intelligence Lab and a Principal Investigator at MIT's Initiative on the Digital Economy.

Previously, he was an Assistant Professor of Innovation and Strategy at the MIT Sloan School of Management, where he co-directed the Experimental Innovation Lab (X-Lab), and a Visiting Professor at the Laboratory for Innovation Science at Harvard. He has advised businesses and government on the future of Moore's Law, has been on National Academies panels on transformational technologies and scientific reliability, and is part of the Council on Competitiveness' National Commission on Innovation & Competitiveness Frontiers.

He has a PhD in Business and Public Policy from Berkeley, where he also did Masters degrees in Computer Science and Statistics. He also has a masters in Economics from the London School of Economics, and undergraduate degrees in Physics and International Development. Prior to academia, He worked at organizations such as Lawrence Livermore National Laboratory, Bain and Company, the United Nations, the World Bank, and the Canadian Parliament.

Location: IACS Seminar Room

Read more about Which Tasks are Cost-Effective to Automate with Computer Vision?

AI Seminar: Video Architecture Search - Michael Ryoo Abstract: Video understanding is a challenging problem. Because a video contains spatio-temporal data, its feature representation is required to abstract both appearance and motion information. This is not only essential for automated understanding of the semantic content of videos, such as Web-video classification or sport activity recognition, but is also crucial for robot perception and learning. Previously, convolutional neural networks (CNNs) for videos were normally built by manually extending known 2D architectures such as Inception and ResNet to 3D or by carefully designing two-stream CNN architectures that fuse together both appearance and motion information. However, designing an optimal video architecture to best take advantage of spatio-temporal information in videos still remains an open problem. In this talk, we discuss recent progress in neural architecture search for videos, obtaining more optimal network architectures for video understanding.

Read more about AI Seminar: Video Architecture Search by Michael Ryoo

Abstract: Gaussian Probability Path-based Generative Models (GPPGMs) generate data by reversing a stochastic process that progressively corrupts samples with Gaussian noise. Despite state-of-the-art results in 3D molecular generation, their deployment is hindered by the high cost of long generative trajectories, often requiring hundreds to thousands of steps during training and sampling. In this work, we propose a principled method, named GAGA, to improve generation efficiency without sacrificing training granularity or inference fidelity of GPPGMs. Our key insight is that different data modalities obtain sufficient Gaussianity at markedly different steps during the forward process. Based on this observation, we analytically identify a characteristic step at which molecular data attains sufficient Gaussianity, after which the trajectory can be replaced by a closed-form Gaussian approximation. Unlike existing accelerators that coarsen or reformulate trajectories, our approach preserves full-resolution learning dynamics while avoiding redundant transport through truncated distributional states. Experiments on 3D molecular generation benchmarks demonstrate that our GAGA achieves substantial improvement on both generation quality and computational efficiency.

Speaker: Jingxiang Qu

Location: New Computer Science 220

Read more about GAGA: Gaussianity-Aware Gaussian Approximation for Efficient 3D Molecular Generation

Abstract:

Conventional approaches to scientific discovery often prioritize building larger sensors, gathering more data, and scaling up computational power. In this talk, I will present a complementary perspective: extracting insights hidden in the data we already have. The key lies in using AI not as a black-box predictor, but as a tool for interpreting data through its underlying physical process.

I will demonstrate how AI, when integrated with the physics of light propagation, can serve as a computational lens to overcome fundamental limitations in fields ranging from biomedicine to astrophysics. Specifically, I will showcase two compelling applications: non-invasive imaging through scattering biological tissues, and detecting faint exoplanets against the overwhelming brightness of their host stars.

These methods represent a departure from traditional learning-based approaches that rely on fitting models to training labels and hoping for generalization. Instead, with physics-informed strategies that decode how light propagates, we can transform raw measurements into scientifically meaningful insights--without requiring costly hardware upgrades or human-annotated datasets. Finally, I will outline future directions for combining AI with physical principles, enabling us to unlock more phenomena once considered hidden and accelerating discoveries in healthcare, astronomy, and beyond.

Short Bio:

Brandon Y. Feng is a Postdoctoral Associate at MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and a Visiting Scientist at the Harvard-Smithsonian Center for Astrophysics. His research bridges artificial intelligence and physics to expand the limits of human and machine vision. He develops AI-driven methods that reveal hidden patterns in complex visual data, driving breakthroughs in areas such as exoplanet detection and imaging through scattering tissues. His work has been published in top venues, including Science Advances, CVPR, ICCV, ECCV, and NeurIPS, and has been featured in Science.org, New Scientist, and Phys.org. He holds a Ph.D. in Computer Science from the University of Maryland, along with a B.A. in Computer Science and Statistics and an M.S. in Statistics from the University of Virginia.

Location: NCS 220

Read more about AI as a Lens: Expanding Vision for Scientific Discovery

Simons Laufer Mathematical Sciences Institute presents...

In 2023, Tudor Achim co-founded Harmonic with Vlad Tenev to build the world's most advanced reasoning engine. Combining formal verification with informal reasoning, Harmonic's formal reasoning model, Aristotle, achieved gold-medal-equivalent performance on the 2025 International Mathematical Olympiad problems. Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver.

Achim is also the Co-Founder and former CTO of Helm.ai. He holds a B.S. in Computer Science from Carnegie Mellon University and was a PhD Candidate in Computer Science at Stanford University.

Read more about The Future of Formal Mathematics is Here

The Forty-Second International Conference on Machine Learning will take place at the Vancouver Convention Center from July 13th to July 19th. Register here.

Read more about International Conference on Machine Learning 2025

Abstract:

Capturing the spatio-temporal (4D) dynamics of humans has been a long standing research problem in computer vision and graphics. Synthesizing photorealistic human avatars has broad applications, ranging from immersive telepresence in AR/VR and the movie industry, to enriching the education and healthcare systems. Earlier approaches relied on hand-engineered models that use a small amount of data from one or more subjects. With the advent of neural networks, training on large datasets enhanced the output visual quality. Currently, the combination of neural networks with graphics techniques has achieved natural-looking human animation. However, most approaches are identity-specific, trained only on a single identity, and use only one modality.

In this thesis, we address the problem of learning neural representations of humans in a holistic way. Given that the video data in the real world include multiple modalities (audio and video) and multiple identities, we develop multi-modal and multi-identity representations. First, we propose to reconstruct the 4D face geometry of humans by leveraging both audio and video information. In this way, the network produces accurate lip shapes and is robust to cases when either modality is insufficient. Next, we introduce a NeRF-based representation for audio-driven human face animation that achieves high-quality lip synchronization for cinematic content. Since humans communicate with their full body, combining body pose, hand gestures, and facial expressions, we extend our network to capture the full-body human motion for multiple identities simultaneously. In order to better disentangle identity and non-identity specific information, we subsequently study non-linear interactions between latent factors of variation, and propose a specific multiplicative module. In this way, we learn a multi-identity NeRF that robustly animates human faces under novel expressions and achieves a significant decrease in the total training time. Similarly, we propose a multi-identity gaussian splatting representation for human bodies, by constructing a high-order tensor. Assuming a low-rank structure, we learn a tensor decomposition that leads to a significant decrease in the total number of learnable parameters, as well as to a robust animation under novel poses. In the future, we propose to jointly synthesize audio and visual outputs from just text input. Given the recent rise of large language models, coupling text with natural-looking avatars can enhance the overall interaction between a human and an AI system.

Speaker: Aggelina Chatziagapi

Where: NCS, Room 220

Zoom link: https://stonybrook.zoom.us/j/98775312249?pwd=uORNAnSdcssrPZdqOsqaMAF5aLcRD9.1
ID: 98775312249
Passcode: 505777

Read more about Multi-Modal Neural Representations for Humans

Abstract: In high-dimensional data spaces, vast empty regions often exist where no known data points are present. These empty spaces are not merely gaps but hold untapped potential for discovering novel configurations, optimizing parameters, and improving decision-making processes. However, traditional exploration techniques struggle to identify and leverage these regions due to the curse of dimensionality. To address this, we introduce the Empty Space Search Algorithm (ESA), a scalable, physics-inspired method that systematically identifies and explores these uncharted voids. ESA operates by modeling the data space as a dynamic system, using a repulsion-attraction mechanism to locate optimal empty space configurations (ESCs) without requiring exhaustive search. Building upon ESA, we present GapMiner, a visual analytics system that integrates human-in-the-loop AI to iteratively refine and validate ESCs. GapMiner combines parallel coordinate visualization, interactive optimization, and deep learning-based predictive modeling to enhance the efficiency of empty space exploration. This methodology has broad applications, including accelerating convergence in evolutionary algorithms through a more diverse initial population, optimizing adversarial learning strategies, and discovering novel parameter configurations in reinforcement learning. Our approach demonstrates that empty space is not just an absence of data but a frontier for new possibilities in high-dimensional problem-solving.
Bio: Xinyu Zhang received his B.E. in Computer Science from Shandong University, Taishan College, in 2019. He is currently a final-year Ph.D. candidate in the Department of Computer Science at Stony Brook University, advised by Prof. Klaus Mueller. His research focuses on multivariate data analysis, scientific visualization, and reinforcement learning. He has published multiple papers in top-tier journals and conferences, including IEEE TVCG and NeurIPS.
*this seminar will be held in person (food provided on a first come, first serve basis), and online (zoom link below)!
Topic: IACS Student Seminar Speaker: Xinyu Zhang
Time: Feb 26, 2025 12:00 PM Eastern Time (US and Canada)
Join Zoom Meeting
https://stonybrook.zoom.us/j/91848218975?pwd=lfITFa61GaXZ2Wsa1B1OnbLQMmXvOE.1

Meeting ID: 918 4821 8975
Passcode: 027337

Read more about Into the Void: Mapping the Unseen Gaps in High Dimensional Data

The Program in Writing and Rhetoric
Invites you to
A Rhetorical/Deliberative Framework for AI Language Model Alignment
featuring
Prof Zoltan Majdik Professor
North Dakota State University
In this talk, Prof. Majdik proposes a framework for aligning LLMs with values grounded in the norms of rhetorical culture and deliberative democracy. Alongside long-standing AI alignment value targets like safety and transparency, this AI alignment framework assesses to what extent a language model exhibits human and humane values that foster communicative engagement, and it codifies approaches to tuning existing models to better align with such values.

Location: Humanities 1008

Read more about A Rhetorical/Deliberative Framework for AI Language Model Alignment