How do you get the most out of generative AI? Stop by the library Galleria outside of the Central Reading Room to learn more! Librarians Chris Kretz and Ahmad Pratama, along with David Ecker of DoIT, will be demonstrating tools and tips for writing prompts that make the most of what AI can do. And they'll be hosting Explore AI demos this Monday - Wednesday (March 3rd-5th) 12:30 - 1:30. Whether you're new to AI or a current user, they'd love to talk to you about it.
Location: Melville Library Galleria
Abstract:
Many real world complex problems are multi-step reasoning tasks. These range from analytic tasks such as answering questions to automation tasks where agents complete tasks on behalf of users.. Evaluation, datasets, and models for such tasks can be unreliable for multiple reasons. (i) Datasets often have annotation artifacts and biases, allowing models to take reasoning shortcuts. Such shortcuts can allow models to make effective guesses -- or, in a sense, cheat -- to achieve high performance without any multi-step reasoning. This issue is further exacerbated for complex tasks because as the number of the required reasoning steps increases, so do the avenues for bypassing those steps. (ii) Models trained on such dataset/s learn to solve the task by taking reasoning shortcuts instead of proper multi-step reasoning. As a result, these models are not robust (reliable) when evaluated in an out-of-distribution evaluation setting. (iii) Lastly, recent works have shown that language models can solve complex multi-step tasks by producing a step-by-step explanation without any training. However, these methods often hallucinate factually incorrect (i.e., unreliable) explanations when posed with knowledge-intensive tasks.
I address these challenges by carefully characterizing the requirements of robust multi-step reasoning and designing reliable evaluation datasets and training methods that necessitate thorough multi-step reasoning. In DiRe, I first formalize and introduce Disconnected Reasoning, i.e., reasoning that allows models to arrive at the correct answer by bypassing necessary reasoning steps, and use this formalization to measure how much multi-step reasoning a model does on a dataset. In MuSiQue, I built a multi-step reasoning dataset for QA from scratch that avoids cheatability via disconnected reasoning, providing a more reliable evaluation. In TeaBReaC, I developed a synthetically generated multi-step QA pretraining dataset designed to force models to avoid disconnected reasoning and learn reliable multi-step reasoning. In IRCoT, I address the reliability of model-generated multi-step reasoning chains by interleaving models' step-by-step reasoning with a step-by-step retrieval from an external corpus, resulting in more factually correct reasoning. Finally, in AppWorld, I built a multi-step reasoning dataset that requires highly interactive problem-solving in an environment carefully designed to ensure models need thorough reasoning to succeed.
Speaker: Harsh Trivedi
Location: NCS 220 or Zoom
https://stonybrook.zoom.us/j/99096379762?pwd=zYCJZQVxRuZd9BboscO4nlodCwsKBr.1
Abstract:
Conventional approaches to scientific discovery often prioritize building larger sensors, gathering more data, and scaling up computational power. In this talk, I will present a complementary perspective: extracting insights hidden in the data we already have. The key lies in using AI not as a black-box predictor, but as a tool for interpreting data through its underlying physical process.
I will demonstrate how AI, when integrated with the physics of light propagation, can serve as a computational lens to overcome fundamental limitations in fields ranging from biomedicine to astrophysics. Specifically, I will showcase two compelling applications: non-invasive imaging through scattering biological tissues, and detecting faint exoplanets against the overwhelming brightness of their host stars.
These methods represent a departure from traditional learning-based approaches that rely on fitting models to training labels and hoping for generalization. Instead, with physics-informed strategies that decode how light propagates, we can transform raw measurements into scientifically meaningful insights--without requiring costly hardware upgrades or human-annotated datasets. Finally, I will outline future directions for combining AI with physical principles, enabling us to unlock more phenomena once considered hidden and accelerating discoveries in healthcare, astronomy, and beyond.
Short Bio:
Brandon Y. Feng is a Postdoctoral Associate at MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and a Visiting Scientist at the Harvard-Smithsonian Center for Astrophysics. His research bridges artificial intelligence and physics to expand the limits of human and machine vision. He develops AI-driven methods that reveal hidden patterns in complex visual data, driving breakthroughs in areas such as exoplanet detection and imaging through scattering tissues. His work has been published in top venues, including Science Advances, CVPR, ICCV, ECCV, and NeurIPS, and has been featured in Science.org, New Scientist, and Phys.org. He holds a Ph.D. in Computer Science from the University of Maryland, along with a B.A. in Computer Science and Statistics and an M.S. in Statistics from the University of Virginia.
Location: NCS 220
Abstract: Implicit functions have long been a fundamental representation for both 2D and 3D objects in computer graphics, playing a significant role in the field's early development. With the rise of 3D deep learning and the rapid advancement of neural rendering techniques, implicit representations of 3D shapes have regained significant attention in recent years. In this talk, I will present several recent research projects focusing on implicit function-based 3D reconstruction and neural rendering. Furthermore, I will discuss potential future developments in this dynamic and rapidly evolving field.
Biography: Ying He is an Associate Professor at the College of Computing and Data Science, Nanyang Technological University, where he also serves as the Director of the Centre for Augmented and Virtual Reality. His research interests lie in geometric computation and analysis, with applications spanning computer graphics, 3D vision, computer-aided design, multimedia, and wireless sensor networks. Dr. He is an active member of the technical program committees for major conferences on geometric modeling and has served on the editorial boards of IEEE Transactions on Visualization and Computer Graphics, Computer Graphics Forum, and Computational Visual Media. He has also taken on key leadership roles as General/Program Co-Chair for several conferences, including Shape Modeling International (SMI) 2022, Solid and Physical Modeling (SPM) 2022 & 2023, Geometric Modeling and Processing (GMP) 2014 & 2021, and Computational Visual Media (CVM) 2020. For more information, please visit https://personal.ntu.
Location: NCS 115
Abstract: In high-dimensional data spaces, vast empty regions often exist where no known data points are present. These empty spaces are not merely gaps but hold untapped potential for discovering novel configurations, optimizing parameters, and improving decision-making processes. However, traditional exploration techniques struggle to identify and leverage these regions due to the curse of dimensionality. To address this, we introduce the Empty Space Search Algorithm (ESA), a scalable, physics-inspired method that systematically identifies and explores these uncharted voids. ESA operates by modeling the data space as a dynamic system, using a repulsion-attraction mechanism to locate optimal empty space configurations (ESCs) without requiring exhaustive search. Building upon ESA, we present GapMiner, a visual analytics system that integrates human-in-the-loop AI to iteratively refine and validate ESCs. GapMiner combines parallel coordinate visualization, interactive optimization, and deep learning-based predictive modeling to enhance the efficiency of empty space exploration. This methodology has broad applications, including accelerating convergence in evolutionary algorithms through a more diverse initial population, optimizing adversarial learning strategies, and discovering novel parameter configurations in reinforcement learning. Our approach demonstrates that empty space is not just an absence of data but a frontier for new possibilities in high-dimensional problem-solving.
Bio: Xinyu Zhang received his B.E. in Computer Science from Shandong University, Taishan College, in 2019. He is currently a final-year Ph.D. candidate in the Department of Computer Science at Stony Brook University, advised by Prof. Klaus Mueller. His research focuses on multivariate data analysis, scientific visualization, and reinforcement learning. He has published multiple papers in top-tier journals and conferences, including IEEE TVCG and NeurIPS.
*this seminar will be held in person (food provided on a first come, first serve basis), and online (zoom link below)!
Topic: IACS Student Seminar Speaker: Xinyu Zhang
Time: Feb 26, 2025 12:00 PM Eastern Time (US and Canada)
Join Zoom Meeting
https://stonybrook.zoom.us/j/
Meeting ID: 918 4821 8975
Passcode: 027337
Abstract:
Capturing the spatio-temporal (4D) dynamics of humans has been a long standing research problem in computer vision and graphics. Synthesizing photorealistic human avatars has broad applications, ranging from immersive telepresence in AR/VR and the movie industry, to enriching the education and healthcare systems. Earlier approaches relied on hand-engineered models that use a small amount of data from one or more subjects. With the advent of neural networks, training on large datasets enhanced the output visual quality. Currently, the combination of neural networks with graphics techniques has achieved natural-looking human animation. However, most approaches are identity-specific, trained only on a single identity, and use only one modality.
In this thesis, we address the problem of learning neural representations of humans in a holistic way. Given that the video data in the real world include multiple modalities (audio and video) and multiple identities, we develop multi-modal and multi-identity representations. First, we propose to reconstruct the 4D face geometry of humans by leveraging both audio and video information. In this way, the network produces accurate lip shapes and is robust to cases when either modality is insufficient. Next, we introduce a NeRF-based representation for audio-driven human face animation that achieves high-quality lip synchronization for cinematic content. Since humans communicate with their full body, combining body pose, hand gestures, and facial expressions, we extend our network to capture the full-body human motion for multiple identities simultaneously. In order to better disentangle identity and non-identity specific information, we subsequently study non-linear interactions between latent factors of variation, and propose a specific multiplicative module. In this way, we learn a multi-identity NeRF that robustly animates human faces under novel expressions and achieves a significant decrease in the total training time. Similarly, we propose a multi-identity gaussian splatting representation for human bodies, by constructing a high-order tensor. Assuming a low-rank structure, we learn a tensor decomposition that leads to a significant decrease in the total number of learnable parameters, as well as to a robust animation under novel poses. In the future, we propose to jointly synthesize audio and visual outputs from just text input. Given the recent rise of large language models, coupling text with natural-looking avatars can enhance the overall interaction between a human and an AI system.
Speaker: Aggelina Chatziagapi
Where: NCS, Room 220
Zoom link: https://stonybrook.zoom.
ID: 98775312249
Passcode: 505777
Abstract:
Photorealistic editing of human facial expressions and head articulations remains a long-standing topic in the computer graphics and computer vision community. Methods enabling such control have great potential in AR/VR applications where a 3D immersive experience is valuable, especially when this control extends to novel views of the scene in which the human subject appears. Traditionally, 3D Morphable Face Models (3DMMs) have been used to control the facial expressions and head pose of a human head. However, the PCA-based shape and expression spaces of 3DMMs lack the expressivity. They cannot model essential elements of the human head such as hair, skin details, and accessories such as glasses that are paramount for realistic reanimation. In this thesis, we present a set of methods that enables facial reanimation, starting from editing expressions in still face images to creating fully controllable neural 3D portraits with control over facial expressions, head pose, and viewing direction of the scene using only casually captured monocular videos from a smartphone to finally achieving studio-like quality from the said monocular captures.
First, we propose a method for editing facial expressions in near-frontal facial images through the unsupervised disentangling of expression-induced deformations and texture changes. Next, we extend facial expression editing to human subjects in 3D scenes. We represent the scene and the subject in it using a semantically guided neural field. This enables control over the subject's facial expressions and the viewing direction of the scene they're in. We then present a method that learns, in an unsupervised manner, to deform static 3D neural fields using facial expression and head-pose dependent deformations, enabling control over facial expressions and head pose of the subject along with the viewing direction of the 3D scene they're in. Next, we propose a method that makes the learning of the aforementioned deformation field robust to strong illumination effects, which adversely impact the registration of the deformation. We then propose an extension of this unsupervised deformation model to 3D Gaussian splatting by constraining it using a 3D morphable model, resulting in a rendering speed of 18 FPS--a 100x speed improvement over prior work. Finally, we propose a method that bridges the quality gap between 3D portraits created using in-the-wild monocular data and multi-view studio capture data. We accomplish this using a two-stage method. First, we train a StyleGAN to relight and inpaint in-the-wild face texture maps (with strong illumination effects and incompletely captured regions). Next, we both reconstruct and generate identity-specific facial details that may be poorly captured in the in-the-wild captures. Once trained, we can generate studio-like complete avatars from monocular phone captures.
Speaker: Shahrukh Athar
Zoom Link:
https://stonybrook.zoom.us/j/94228500743?pwd=RqOBgG6tbJkKaFBlWFwBkYFX0VRovV.1
Meeting ID: 94228500743
Passcode: 661599
Abstract: The remarkable success of large foundational models, such as LLMs and diffusion models, is built on their learning over vast amounts of static data from the Internet. However, human learning and problem-solving are fundamentally interactive processes--humans learn by engaging with their environment, tools, search engine, and feedback loops, iteratively refining their understanding and decisions. This gap between the interactivity of human learning and the static nature of model training raises a critical question: how can we imbue foundational models with the capacity for meaningful interaction?
In this talk, I will explore methods to enhance foundational models by incorporating interaction with the external environment. I will discuss strategies such as leveraging external tools, compilers, function calls to provide dynamic feedback to enhance foundation models. By drawing inspiration from human's interactive learning processes, I demonstrate how interaction-driven learning can lead to models that are not only more accurate but also more adaptable to real-world applications.
This work bridges the gap between static training paradigms and the dynamic, iterative nature of human intelligence, paving the way for a new generation of interactive AI systems.
Bio: Wenhu Chen has been an assistant professor at the Computer Science Department in University of Waterloo and Vector Institute since 2022. He obtained the Canada CIFAR AI Chair Award in 2022 and CIFAR Catalyst Award in 2024. He has worked for Google Deepmind as a part-time research scientist since 2021. Before that, he obtained his PhD from the University of California, Santa Barbara under the supervision of William Wang and Xifeng Yan. His research interest lies in natural language processing, deep learning and multimodal learning. He aims to design models to handle complex reasoning scenarios like math problem-solving, structure knowledge grounding, etc. He is also interested in building more powerful multimodal models to bridge different modalities. He received the Area Chair Award in AACL 2023, the Best Paper Honorable Mention in WACV 2021, the Best Paper Finalist in CVPR 2024, and the UCSB CS Outstanding Dissertation Award in 2021.
A lecture by-
Chris Wiggins
Columbia University and
Matthew L. Jones
Princeton University
The co-authors of the book How Data Happened will trace the dynamic relationships among data, truth, and power, exploring how data-empowered algorithms have come to shape our personal, professional, and political realities.
Location: 1008 Humanities
Abstract:
Large language models (LLMs) have transformed the way humans write code, bringing unprecedented automation to software development. In this talk, I will first provide an overview of my research on enhancing LLMs' code intelligence, optimizing each step of the development pipeline towards more complex software engineering tasks. I will then delve into my key contributions, focusing on how to equip LLMs with a deeper, more comprehensive understanding of software programs. Finally, I will discuss the future of AI-driven software engineering, envisioning a new era of automation that is more reliable, intelligent, and cost-efficient.
Bio:
Yangruibo (Robin) Ding is a Ph.D. candidate in the Department of Computer Science at Columbia University. His research is at the intersection of Software Engineering and Machine Learning, focusing on developing large language models (LLMs) for code. He trains LLMs to generate, analyze, and refine software programs and constructs benchmarks to systematically evaluate LLMs in solving software engineering tasks. He also studies how to improve LLMs' reasoning capability to tackle complex programming tasks, such as debugging and patching. His interdisciplinary research has been published in top-tier conferences of software engineering, programming languages, natural language processing, and machine learning. He won an ACM SIGSOFT Distinguished Paper Award, an IEEE TSE Best Paper Runner-up, and received an IBM Ph.D. Fellowship.
Location:
NCS 120