The overall purpose of this seminar is to bring together people with interests in Computer Vision theory and techniques and to examine current research issues. This course will be appropriate for people who already took a Computer Vision graduate course or already had research experience in Computer Vision. To enroll in this course, you must either: (1) be in the PhD program or (2) receive permission from the instructors. Each seminar will consist of multiple short talks (around 15 minutes) by multiple students. Students can register for 1 credit for CSE656. Registered students must attend and present a minimum of 2 talks. Everyone else is welcome to attend. Fill in https://forms.gle/q6UG9ygauLp2a8Po8 to subscribe to our mailing list for further announcement.

Abstract: The remarkable success of large foundational models, such as LLMs and diffusion models, is built on their learning over vast amounts of static data from the Internet. However, human learning and problem-solving are fundamentally interactive processes--humans learn by engaging with their environment, tools, search engine, and feedback loops, iteratively refining their understanding and decisions. This gap between the interactivity of human learning and the static nature of model training raises a critical question: how can we imbue foundational models with the capacity for meaningful interaction?

In this talk, I will explore methods to enhance foundational models by incorporating interaction with the external environment. I will discuss strategies such as leveraging external tools, compilers, function calls to provide dynamic feedback to enhance foundation models. By drawing inspiration from human's interactive learning processes, I demonstrate how interaction-driven learning can lead to models that are not only more accurate but also more adaptable to real-world applications.

This work bridges the gap between static training paradigms and the dynamic, iterative nature of human intelligence, paving the way for a new generation of interactive AI systems.

Bio: Wenhu Chen has been an assistant professor at the Computer Science Department in University of Waterloo and Vector Institute since 2022. He obtained the Canada CIFAR AI Chair Award in 2022 and CIFAR Catalyst Award in 2024. He has worked for Google Deepmind as a part-time research scientist since 2021. Before that, he obtained his PhD from the University of California, Santa Barbara under the supervision of William Wang and Xifeng Yan. His research interest lies in natural language processing, deep learning and multimodal learning. He aims to design models to handle complex reasoning scenarios like math problem-solving, structure knowledge grounding, etc. He is also interested in building more powerful multimodal models to bridge different modalities. He received the Area Chair Award in AACL 2023, the Best Paper Honorable Mention in WACV 2021, the Best Paper Finalist in CVPR 2024, and the UCSB CS Outstanding Dissertation Award in 2021.

You are cordially invited to attend the biweekly Brookhaven AI Mixer (BAM). BAM includes one short talk on AI research happening at BNL, followed by an open mixer over coffee and snacks for everyone to network and discuss all things AI. The first half hour will consist of presentations that will be available via ZOOM, and the second half hour will be for in person only networking.

Join us every other Tuesday at noon in CDSD's Training Room (building 725, 2nd floor) to learn about interesting AI methods and applications, engage with potential collaborators, prepare for pending FASST funding calls, and build a community of AI for Science at BNL.

Embodied Intelligence at Scientific User Facilities

Abstract: This presentation explores the active work integrating artificial intelligence and robotics at the National Synchrotron Light Source II, and a perspective for the future. Through various case studies, we highlight the optimization of operations, improved experimental outcomes, and the orchestration of distributed multimodal experiments. This ongoing development includes collaborators from across the light and neutron sources in the DOE complex. We will elaborate on the open-source Bluesky project, and its capabilities to support adaptive and autonomous experiments. Additionally, we will discuss how Bluesky can be integrated with open-source robotic control software to unlock new flexible automation for autonomous scientific research, which scales to new experiments and continues to leverage human ingenuity.

Biography: Dr. Phillip M. Maffettone is an Associate Computational Scientist in the Data Science and Systems Integration Division at NSLS-II. His research focuses on accelerating scientific discovery at user facilities through the integration of robotics, artificial intelligence (AI), and advanced experiment orchestration systems. He leads the N3XTware project, constructing the software architecture for the next 12 beamlines to be built at NSLS-II. Prior to this he built the brain on the world's first mobile robotic scientist at the University of Liverpool, and later spearheaded the machine learning platform for a biotechnology start-up, BigHat Biosciences. He holds a DPhil in Inorganic Chemistry from the University of Oxford and a B.S. in Chemical Engineering from the University at Buffalo.

Location: CDS, Bldg. 725, Training Room

Link: https://bnl.zoomgov.com/j/16049713 31?pwd=nc5CV3cOFrdYxordFieP W07tIDmwYb.1

Meeting ID: 160 497 1331
Passcode: 289875

The 20th International Conference on Emerging Technologies for a Smarter World (CEWIT 2025)

The Innovation Edge: Harnessing AI for the Future
Exploring Generative AI, Agentic AI, and Frontier Technologies Revolutionizing Healthcare, Defense, Energy, FinTech, and Beyond

Organized by the New York State Center of Excellence in Wireless and Information Technology (CEWIT) at Stony Brook University, our international conference is a destination for researchers, innovators and entrepreneurs, across borders and disciplines. CEWIT2023 conference attracted over 150 industry and academic participants worldwide. Over twenty-three presenters took the podium in breakout sessions and engaging panel discussions.

Continuing the tradition since the inception of our conference in 2003, CEWIT2025 will be a premier forum for presentations of cutting-edge research as well as the exchange and transfer of emerging technologies and innovative applications. We are expecting renowned speakers, presenters and panelists from industry, academia and government, beginning with a series of plenary presentations & a keynote, and followed by several conversational panels - all for an audience ready to network!


Location: The Center of Excellence in Wireless and Information Technology (CEWIT), Stony Brook University

Event Details: Visit CEWIT2025 site to learn more about the event

Questions/Concerns: CEWIT Conference Team at 631-216-7114 or info@cewit.org

This virtual presentation series is designed to inform the Stony Brook University research community about the Research Funding Landscape of key topic areas. Our Strategic Research Initiatives team will provide insight into the rapidly shifting funding environment using policy briefs, budgetary priorities, and relevant legislation. We will highlight federal and state priorities in the current and upcoming years to help Stony Brook researchers develop strategies for pursuing funding in a rapidly shifting environment. This series is moderated by Mónica Bugallo, Interim Vice President for Research & Innovation.

Join us for the third in the series, focused on the artificial intelligence landscape:


Translating the Funding Landscape for Stony Brook Researchers: Artificial Intelligence
Presented by Catherine Chen, Ph.D., Research Development Associate
Faculty Respondent: Assistant Professor Nav Nidhi Rajput, Department of Materials Science and Chemical Engineering
Wednesday, April 22, 2026 at 2 pm to 3 pm

Registration is Required

The University's Main Commencement Ceremony will take place on Friday, May 23, 2025 at 11 am at Kenneth P. LaValle Stadium. Gates open at 10 am.

All guests need a valid ticket to enter LaValle Stadium - no exceptions. Children age 1 and older require a ticket. Seating is first-come, first-served.

Register here.


Reception to follow.

Abstract:
In this talk, I will present our journey of developing diverse, adaptive, uncertainty-calibrated AI planning agents that can robustly communicate and collaborate for multi-agent reasoning (on math, commonsense, coding, etc.) as well as for interpretable, controllable multimodal generation (across text, images, videos, audio, layouts, etc.). In the first part, we will discuss improving reasoning via multi-agent discussion among diverse LLMs and structured distillation of these discussion graphs (ReConcile, MAGDi), adaptively learning to balance abstraction, decomposition, refinement, and fast+slow thinking in LLM-agent reasoning (ReGAL, ADaPT, MAgICoRe, System-1.x), as well as confidence calibration in LLMs via speaker-listener pragmatic reasoning and making LLMs better teammates via multi-agent positive-negative persuasion balancing (LACIE, PBT). In the second part, we will discuss interpretable and control-lable multimodal generation via LLM-agents based planning and programming, such as layout-controllable image generation (and evaluation) via visual programming (VPGen+VPEval), consistent multi-scene video generation via LLM-guided planning (VideoDirectorGPT), interactive and composable any-to-any multimodal generation (CoDi, CoDi-2), as well as feedback-driven multi-agent interaction for adaptive environment/data generation via weakness discovery (EnvGen, DataEnvGym).
Bio:
Dr. Mohit Bansal is the John R. & Louise S. Parker Distinguished Professor and the Director of the MURGe-Lab (UNC-NLP Group) in the Computer Science department at UNC Chapel Hill. He received his PhD from UC Berkeley in 2013 and his BTech from IIT Kanpur in 2008. His research expertise is in natural language processing and multimodal machine learning, with a particular focus on multimodal generative models, grounded and embodied semantics, faithful language generation, and interpretable, efficient, and generalizable deep learning.