Abstract: Pre-trained diffusion and flow matching models have made visual generation remarkably powerful, enabling high-fidelity synthesis of images and videos from natural language prompts. However, their behavior is still largely dictated by the pre-training data distribution and likelihood objective, which do not directly encode downstream desiderata such as fine-grained semantic alignment, controllability, or realism. This gap motivates post-training: starting from a base generator and further optimizing it with additional supervision signals derived from human or reward model preferences.This work presents post-training for visual generative models through two complementary case studies. First, Hummingbird addresses the problem of fine-grained contextual alignment in image-text-to-image generation. We introduce a multimodal context evaluator that scores the consistency between rich contextual descriptions and generated images, capturing fine-grained alignment beyond global CLIP similarity. By directly backpropagating these differentiable rewards through the diffusion sampler, Hummingbird substantially improves semantic faithfulness while preserving high visual quality.
Second, PISCES tackles post-training for text-to-video generation, where alignment is inherently semantic-spatio-temporal. We show that naive VLM-based rewards suffer from distributional mismatch and token-level misalignment, leading to reward hacking and suboptimal optimization. PISCES introduces a bi-objective, Optimal Transport (OT)-aligned reward module: distributional OT using Neural Optimal Transport to align text and video embedding distributions, and discrete, partial OT over a spatio-temporal cost matrix to capture semantic alignment at the token level. These rewards are integrated into both direct backpropagation and GRPO-style optimization to post-train state-of-the-art text-to-video generators. Together, Hummingbird and PISCES provide a unified view of how carefully designed visual reward models, coupled with OT-based representation alignment, can reliably improve the downstream behavior of pre-trained image and video generators.

Speaker: Minh Quan Le

Location: NCS 220

Zoom: https://stonybrook.zoom.us/j/94798224254?pwd=CFraer25qnpORbJ14aAVHRwaSJOjJM.1

What AI tools are available to help with the scholarly research process? Are they helpful? What do they do and is it worth the time and energy to try them out? Join librarian Christine Fena to explore and compare established and emerging AI research tools such as Elicit, Scite, Consensus, and Undermind. The workshop will not offer a lengthy tutorial on how to use any of these tools, but will provide a starting point to understanding what they are, what new ones are emerging, and how AI research assistants might bring changes to your search process. All are welcome!

Register for this Zoom workshop.

Join us as we celebrate this year's Brook & Beyond Challenge finalists.
The Office for Research and Innovation invites you to hear about the two-month journey in which the Brook & Beyond team supported eight cohorts in bringing their bold ideas from the lab to the marketplace. It's an energizing evening that highlights the collaboration, creativity, and entrepreneurial spirit driving discovery across the University.
Meet this year's award recipients, hear pitches from the emerging founders, and applaud their achievements.
Connect, celebrate, and be part of the momentum shaping the future of innovation at
Stony Brook University.
Refreshments will be served. Registration is required.
Register Here.

Join us to share your thoughts about teaching, learning, and AI!

The landscape of higher education is rapidly evolving with the integration of Artificial Intelligence (AI). Through the Institute on AI, Pedagogy, and the Curriculum with AAC&U, we are exploring ways that we can better address AI in teaching and learning. We want to hear your experiences, your concerns, and your ideas.

This is an open discussion for all faculty and staff to share their perspectives on the opportunities and challenges AI presents in our academic environment.

We'll be exploring critical questions like:

  • In the age of AI, what are the opportunities you see for enriching the classroom and curriculum? How can it enhance student learning or your professional practice?

  • What are the most significant challenges and concerns that AI raises for you regarding academics, student integrity, or your workload?

  • What resources (tools, training, technical support, policy guidance, etc.) do you need to feel confident and successful in the age of AI?

Dates/Times:

  • Tuesday, 2/3 at 2pm

  • Friday, 2/6 at 9:30am

Please register in advance for the Zoom link.

Can't Make It? Share Your Feedback!

We understand schedules are tight. If you cannot attend the live discussion, you can still share your thoughts! Join our AI Zoom Room to share your thoughts via video recording or email rose.tirotta-esposito@stonybrook.edu with your comments and ideas.

Videos will not be shared publicly and comments will only be shared in aggregate.

Your input is vital. From pedagogy to assessment, your insights will be critical. We look forward to a thoughtful and productive conversation!

  • Dr. Rose Tirotta-Esposito (Assistant Provost; Director of CELT)

  • Dr. Elizabeth Hewitt (Associate Professor in the Department of Technology and Society (DTS) in the College of Engineering and Applied Sciences)

  • Chris Kretz (Associate Librarian and Head of Academic Engagement at SBU Libraries)

  • Prof. Rajiv Lajmi (Assistant Professor in the School of Health Professions and Chair of Applied Health Informatics)

  • Dr. Matthew Salzano (Assistant Professor in the Department of Communication in the School of Communication and Journalism)

The Challenges of Machine Learning in Adversarial Settings by Patrick McDaniel, Pennsylvania State University

Abstract: Advances in AI and machine learning have enabled new applications and services to interpret and process inputs in previously unthinkable complex environments. Autonomous cars, data analytics, adaptive communication and self-aware software systems are now revolutionizing markets by achieving or exceeding human performance. In this talk, I consider the evolving use of machine learning in security-sensitive contexts and explore why many systems are vulnerable to nonobvious and potentially dangerous manipulation. Here, we examine sensitivity in any application whose misuse might lead to harm--for instance, forcing adaptive network in an unstable state, crashing an autonomous vehicle or bypassing an adult content filter. I explore the use of machine learning in this area particularly in light of recent discoveries in the creation of adversarial samples and defenses against them and posit on future attacks on machine learning. The talk is concluded with a discussion of the technological and societal challenges we face as a result of current and future advances in intelligent computing.

Bio: Patrick McDaniel is the William L. Weiss Professor of Information and Communications Technology and Director of the Institute for Networking and Security Research in the School of Electrical Engineering and Computer Science at the Pennsylvania State University. Professor McDaniel is also a Fellow of the IEEE and ACM and the director of the NSF Frontier Center for Trustworthy Machine Learning. He also served as the program manager and lead scientist for the Army Research Laboratory's Cyber-Security Collaborative Research Alliance from 2013 to 2018. Patrick's research centrally focuses on a wide range of topics in computer and network security and technical public policy. Prior to joining Penn State in 2004, he was a senior research staff member at AT&T Labs-Research.