What Generative Visual Models Understand (and Don't) About the Physical World

Event Description

Abstract: Generative visual models like Stable Diffusion and Sora generate photorealistic images and videos that are nearly indistinguishable from real ones to a naive observer. However, their grasp of the physical world remains an open question: Do they understand 3D geometry, light, and object interactions, or are they mere pixel parrots of their training data? Through systematic probing, I will demonstrate that these models surprisingly learn fundamental scene properties--intrinsic images such as surface normals, depth, albedo, and shading (à la Barrow & Tenenbaum, 1978)--without explicit supervision, which enables applications like image relighting. But I will also show that this knowledge is insufficient. Careful analysis reveals unexpected failures: inconsistent shadows, multiple vanishing points, and scenes that defy basic physics. All these findings suggest these models excel at local texture synthesis but struggle with global reasoning: a crucial gap between imitation and true understanding. I will then conclude by outlining a path toward generative world models that emulate global and counterfactual reasoning, causality, and physics.

Bio: Anand Bhattad is a Research Assistant Professor at the Toyota Technological Institute at Chicago. He earned his PhD from the University of Illinois Urbana-Champaign in 2024 under the mentorship of David Forsyth. His research interests lie at the intersection of computer vision and computer graphics, with a current focus on understanding the knowledge encoded in generative models. Anand has received Outstanding Reviewer honors at ICCV 2023 and CVPR 2021, and his CVPR 2022 paper was nominated for a Best Paper Award. He actively contributes to the research community by leading workshops at CVPR and ECCV, including Scholars and Big Models: How Can Academics Adapt? (CVPR 2023), CV 20/20: A Retrospective Vision (CVPR 2024), Knowledge in Generative Models (ECCV 2024), and How to Stand Out in the Crowd? (CVPR 2025). For more details, visit https://anandbhattad.github.io/

Date Start

Fri, 03/14/2025 - 14:30

Date End

Fri, 03/14/2025 - 15:30

AI Innovation Institute

Event Description

Date Start

Date End