Reception to follow.
Abstract:
In this talk, I will present our journey of developing diverse, adaptive, uncertainty-calibrated AI planning agents that can robustly communicate and collaborate for multi-agent reasoning (on math, commonsense, coding, etc.) as well as for interpretable, controllable multimodal generation (across text, images, videos, audio, layouts, etc.). In the first part, we will discuss improving reasoning via multi-agent discussion among diverse LLMs and structured distillation of these discussion graphs (ReConcile, MAGDi), adaptively learning to balance abstraction, decomposition, refinement, and fast+slow thinking in LLM-agent reasoning (ReGAL, ADaPT, MAgICoRe, System-1.x), as well as confidence calibration in LLMs via speaker-listener pragmatic reasoning and making LLMs better teammates via multi-agent positive-negative persuasion balancing (LACIE, PBT). In the second part, we will discuss interpretable and control-lable multimodal generation via LLM-agents based planning and programming, such as layout-controllable image generation (and evaluation) via visual programming (VPGen+VPEval), consistent multi-scene video generation via LLM-guided planning (VideoDirectorGPT), interactive and composable any-to-any multimodal generation (CoDi, CoDi-2), as well as feedback-driven multi-agent interaction for adaptive environment/data generation via weakness discovery (EnvGen, DataEnvGym).
Bio:
Dr. Mohit Bansal is the John R. & Louise S. Parker Distinguished Professor and the Director of the MURGe-Lab (UNC-NLP Group) in the Computer Science department at UNC Chapel Hill. He received his PhD from UC Berkeley in 2013 and his BTech from IIT Kanpur in 2008. His research expertise is in natural language processing and multimodal machine learning, with a particular focus on multimodal generative models, grounded and embodied semantics, faithful language generation, and interpretable, efficient, and generalizable deep learning.