Abstract: Recent progress in large language and vision models demonstrates how far we can go by scaling with vast internet-scale data. In contrast, physical AI, agents that perceive and act in the real world, still lags far behind. Today, both academia and industry primarily pursue generalizable physical AI by scaling up: collecting large-scale action-video datasets or training world models that enable interaction through learned environments. However, this paradigm is inherently inefficient and will soon reach a data ceiling. In this talk, I argue for a shift from scaling up to scaling out. I introduce reality world simulators, a new paradigm that converts real-world videos into diverse, interactive simulation environments. Instead of relying on more data collection, this approach expands data through structured reconstruction and recomposition, enabling both higher data efficiency and physically grounded interaction. I will present a three-pronged approach: 1) Scaling out via Digital Twins: reconstructing controllable, interactive environments from monocular videos to support diverse agent exploration. 2) Scaling out via Digital Cousins: disentangling scene structure into compositional elements to generate large-scale variations of real-world environments. 3) Scaling out via Embodied Humans: incorporating realistic human dynamics to improve safety and social compliance in robot learning. Finally, I will outline a roadmap toward building generalizable and safe physical AI systems for open-world deployment.

Bio: Dr. Wayne Wu is a postdoctoral researcher at UCLA Computer Science, working closely with Bolei Zhou, and collaborating with Trevor Darrell (UC Berkeley EECS) and Jiaqi Ma (UCLA CEE). He received his Ph.D. in Computer Science and Technology from Tsinghua University in June 2022 and was previously a visiting Ph.D. student at Nanyang Technological University. He also spent seven years in industry, where he led the research and development of products that reached more than 10 million end users worldwide. His research lies at the intersection of computer vision, robotics, and computer graphics. He focuses on developing infrastructure and methods to scale physical AI, enabling robots to work reliably and safely in the open world. He has published over 50 papers at top-tier venues including CVPR, ICCV, ICLR, NeurIPS, and ICRA, with over 9,500 citations and 10,000 GitHub stars. His work has received a CVPR Best Paper Candidate and multiple Oral, Spotlight, and Highlight presentations. He was also honored with the 2025 UCLA Chancellor's Award for Postdoctoral Research, recognizing the best postdocs at UCLA, and he was the only awardee from the School of Engineering. He serves as an Area Chair at CVPR 2026.

Location: NCS 120
AI Institute Seminar Title: A Geometric Understanding of Deep Learning Abstract: This work introduces an optimal transportation (OT) view of generative adversarial networks (GANs). Natural datasets have intrinsic patterns, which can be summarized as the manifold distribution principle: the distribution of a class of data is close to a low-dimensional manifold. GANs mainly accomplish two tasks: manifold learning and probability distribution transformation. The latter can be carried out using the classical OT method. From the OT perspective, the generator computes the OT map, while the discriminator computes the Wasserstein distance between the generated data distribution and the real data distribution; both can be reduced to a convex geometric optimization process. Furthermore, OT theory discovers the intrinsic collaborative--instead of competitive--relation between the generator and the discriminator, and the fundamental reason for mode collapse. We also propose a novel generative model, which uses an autoencoder (AE) for manifold learning and OT map for probability distribution transformation. This AE-OT model improves the theoretical rigor and transparency, as well as the computational stability and efficiency; in particular, it eliminates the mode collapse. The experimental results validate our hypothesis, and demonstrate the advantages of our proposed model.

Abstract: This talk shows how machine learning can address challenges in Astrophysics. We specifically focus on black hole simulations and supernova observations. First, we present a super-resolution technique for black hole simulations that avoids the need for high-resolution labels by leveraging the Hamiltonian and momentum constraints from general relativity. This method reduces constraint violations by one to two orders of magnitude. Next, we introduce Maven, a multimodal foundation model for supernova science. Using contrastive learning to align photometric and spectroscopic data, Maven achieves state-of-the-art results in classification and redshift estimation by pre-training on synthetic data and fine-tuning on real observations.

Bio: Thomas Helfer is a computational physicist specializing in deep learning and physics. Currently based at the Institute for Advanced Computational Science at Stony Brook University, Thomas was previously a postdoctoral fellow at Johns Hopkins and did his PhD with Eugene Lim at King's College in London. In his work, he looks to bridge topics; in his PhD, he bridged theoretical particle physics and gravitational waves. Now, in his postdoctoral work, he aims to find novel applications of deep learning in astrophysics.

*please note: this seminar will be held in a hybrid format*


Location: IACS Seminar Room OR Join Zoom Meeting
https://stonybrook.zoom.us/j/98617630652?pwd=tb4hplPgb3bTTifPCJTCcsn3P9vX8y.1

Meeting ID: 986 1763 0652
Passcode: 882994

Abstract: Materials used in extreme environments, such as high temperatures, irradiation, and stress, often fail due to rapid defect generation and microstructural evolution, and traditional approaches cannot explore the vast design space needed for next-generation alloys. I will present a machine learning framework powered by massive computing that links individual atomic motion to microstructural evolution. Neural network kinetics models trained on first-principles data map vacancy barrier spectra and capture correlated diffusion in multicomponent alloys, revealing design strategies to suppress radiation damage. At larger scales, simulations uncover dislocation patterning and distinguish between confined and extended slip bands, offering new insight into collective dislocation motion and deformation instabilities. By integrating AI-driven modeling, large-scale computing, and experimental validation, my research goal is to accelerate the discovery of damage-tolerant materials and advance fundamental understanding of defect physics in extreme environments.

Speaker Bio: Penghui Cao is an Associate Professor in Mechanical and Aerospace Engineering at the University of California, Irvine, with a joint appointment in Materials Science and Engineering. He received his PhD in mechanical engineering from Boston University and subsequently worked as a Postdoctoral Associate in the Department of Nuclear Science and Engineering at the Massachusetts Institute of Technology from 2014 to 2018. Dr. Cao's research focuses on understanding the fundamental mechanisms that govern radiation responses and microstructure evolution in materials, and on developing advanced alloys for high-performance nuclear energy systems. His lab advances computational and modeling algorithms, integrates advanced manufacturing techniques to tailor microstructures, and leverages state-of-the-art electron microscopy to characterize and assess underlying mechanisms. He is the recipient of the DOE Early Career Research Program Award and the UCI Samueli School's Mid-Career Award for Faculty Excellence in Research.

Location: Institute for Advanced Computational Science, Seminar Room

*This seminar will be held in-person and online. Zoom link below*

Join Zoom Meeting: https://stonybrook.zoom.us/j/96410717491?pwd=3WGMwbLYNMSbI2IF160VXkvv2JmCQ1.1

Meeting ID: 964 1071 7491
Passcode: 399333

Abstract: Many foundation models for digital pathology have been released recently. Benchmarking available methods then becomes paramount to get a clearer view of the research landscape. For this reason, we introduce THUNDER, a tile-level benchmark for digital pathology foundation models, allowing for efficient comparison of many models on diverse datasets with a series of downstream tasks, studying their feature spaces and assessing the robustness and uncertainty of predictions informed by their embeddings. Such foundation models are often used as feature extractors and combined with Multiple Instance Learning (MIL) aggregators at downstream time. Such aggregation must be efficient and reliable. We will focus on two specific examples of this: (I) HistAug, a fast and efficient generative model for controllable augmentations in the latent space of foundation models to perform data augmentation for MIL, and (ii) CAR-MIL, a method based on counterfactual attention regularisation to improve the reliability of attention maps of MIL methods.

Short-bio: Pierre Marza is a Postdoctoral Researcher at CentraleSupelec in the Biomathematics team of the MICS lab, studying Computer Vision and Deep Learning for Medical Imaging, with a focus on Digital Pathology. Prior to this, he was a PhD student at INSA Lyon, in the LIRIS and CITI labs, advised by Christian Wolf, and co-advised by Laetita Matignon and Olivier Simonin. He studied Visual Navigation, Embodied AI, Spatial Reasoning, more specifically how to learn to represent 3D space, generalize to new environments and master diverse tasks from light supervision.

Location: NCS 220

Zoom: https://stonybrook.zoom.us/j/94798224254?pwd=CFraer25qnpORbJ14aAVHRwaSJOjJM.1
The overall purpose of this seminar is to bring together people with interests in Computer Vision theory and techniques and to examine current research issues. This course will be appropriate for people who already took a Computer Vision graduate course or already had research experience in Computer Vision. To enroll in this course, you must either: (1) be in the PhD program or (2) receive permission from the instructors.

Each seminar will consist of multiple short talks (around 10 minutes) by multiple people. Students can register for 1 credit for CSE 656. Registered students must attend and present a minimum of 2 or 3 talks. Everyone else is welcome to attend. Fill in https://forms.gle/pCVXovgfMfQwGqG38 to subscribe to our mailing list for further announcement.

Abstract: Pretraining vision encoders with self-supervision (SSL) leads to stronger representations that excel across diverse downstream tasks. One of the key factors enabling self-supervision is extracting multiple views of the same scene to formulate either: 1) View-invariant pretraining (DINO, SimCLR, iBOT), where the objective is predicting the same representation for different views of the scene; or 2) Cross-view pretraining (cross-view Masked Autoencoders), where the objective is predicting missing parts of one view using other views. For extracting multiple views, view-invariant methods rely on a combination of handcrafted augmentations (random cropping, color jittering, gaussian blur, etc.) of the same image, whereas cross-view pretraining methods rely on image cropping or video frames. In this work, we present methods to effectively incorporate synthetic views from diffusion models into SSL training.
For view-invariant pretraining, we introduce Gen-SIS, a method that leverages the ability of diffusion models to generate interpolated images through interpolation in conditioning space. We introduce a disentanglement pretext task: disentangling two source images from an interpolated synthetic image. This disentanglement task, in addition to vanilla single-source generative augmentation for view extraction, improves visual pretraining of various view-invariant methods (DINO, SimCLR, iBOT).
For cross-view pretraining, we introduce CDG-MAE, a novel cross-view masked autoencoder (MAE) based method that uses diverse synthetic views generated from static images via an image-conditioned diffusion model to learn dense correspondences. We present a quantitative method to evaluate the local and global consistency of the generated views to choose the right diffusion model for cross-view pretraining. These generated views exhibit substantial changes in pose and perspective, providing a rich training signal that overcomes the limitations of video (expensive) and crop-based (less variation) methods. CDG-MAE substantially narrows the gap to video-based MAE methods on video label propagation tasks while maintaining the data advantages of image-only MAEs.

Speaker: Varun Belagali

Location: NCS 120
Zoom: https://stonybrook.zoom.us/j/93647452432?pwd=hZaX7LXCAD8KPHWYE1Afw2sDI3owpv.1

Stony Brook University Libraries invites students, faculty, & staff to join a conversation about how AI is transforming the private sector workforce. As AI tools move from experimentation to everyday business use, companies are rethinking roles, skill sets, leadership, and long-term strategy. This discussion-based event will focus on the fast-paced changes and directions at tech companies and their possible impact. This event will be particularly relevant for students preparing for an AI influenced job market and how to position themselves for opportunities in a rapidly evolving professional landscape.

The discussion will be led by Tariq Khan, Senior Director of Private Cloud Solutions at Hewlett Packard Enterprise. Tariq is a technology leader and architect with experience across private cloud, hybrid cloud, and data center platforms. He is responsible for shaping the technology architecture and strategic direction of HPE's Private Cloud offerings across on premises and cloud integrated environments.

Light refreshments will be served.


Location: Melville Library, NRR, Learning Lab