Are you interested in understanding the challenges that lie ahead as Artificial Intelligence (AI) systems become increasingly autonomous, dynamically acquire information, and adapt behaviors?
 
Join us for an exciting afternoon of talks by visionaries and leaders from industry, government, and academia as we kickoff a three-part Trusted AI Challenge Series designed to Build the Vision - Formalize Challenges - Advance the Art of next generation of AI systems.
 
The Air Force Research Laboratory Information Directorate, The State University of New York, Innovare Advancement Center, NYSTEC, and Griffiss Institute invite you to join us for this half-day virtual event!
 
WHEN: Wednesday, October 14, 2020, 12:00 PM - 4:00 PM EDT
 
Hosted by Innovare Advancement Center, this webinar is the first of a three-part series designed to cultivate, define and fund creative solutions to a set of challenge problems in trustworthy AI with a particular focus on dynamic, autonomous systems that learn and adapt behaviors.
 
Keynote speakers include Dr. David Goldstein of  Space X; Dr. Scott Hubbard of Stanford University; Dr. Pramod Khargonekar of UC Irvine, and more!
 
This event is designed for academic and government researchers, university students, and small businesses.
 
Would you like to understand some of the most formidable technical challenges in future autonomous systems?  Would you like to sponsor some of the brightest minds in AI to work on problems of interest to you? Would you like to learn more about AI in real systems?
 
If so, Save the Date! Wednesday, October 14, 2020, 12:00 PM - 4:00 PM EDT.
 
Please see additional information on the three-part series here. Registration details to follow! 
 
Stay tuned: https://www.innovare.org/news-events  

The Pittsburgh Supercomputing Center is pleased to present a Machine Learning and Big Data workshop.

This workshop will focus on topics including big data analytics and machine learning with Spark, as well as deep learning.

This will be an IN PERSON event hosted by various satellite sites, there WILL NOT be a direct to desktop option for this event. SBU's Institute for Advanced Computational Science (IACS) is one of those satellite sites!

Location: IACS Conference Room #2

Interested applicants must first have an ACCESS ID. If you don't have the ID, please visit this page to create one: ACCESS USER REGISTRATION.


Once you have an ACCESS ID, please login (see top right here) then register here.
Abstract: Recent studies have highlighted the vulnerability of Natural Language Processing (NLP) and Vision-Language Models (VLMs) to backdoor attacks, posing significant security risks. Understanding these attack strategies is crucial for assessing model robustness and developing effective defenses. This thesis proposal aims to investigate the vulnerability of language and vision-language models, analyze abnormal behaviors in backdoor-attacked models, and develop defense methods to enhance safety of modern machine learning models at deployment.


We investigate the internal mechanisms of backdoored NLP models, identifying a distinct attention focus drifting phenomenon, where trigger tokens hijack attention regardless of the input context. Through comprehensive qualitative and quantitative analysis, we provide insights into the underlying mechanisms that enable backdoor attacks. Building on these insights, we propose detection methods to differentiate backdoored models from clean ones, through inspecting both the attention distribution and the model predictions. To better understand the vulnerability, we develop advanced backdoor attack strategies targeting language models in classification tasks. For BERT variants, we introduce Trojan Attention Loss (TAL), a novel method that directly manipulates attention patterns to enhance backdoor effectiveness, ensuring stealth and robustness. Vision-Language Models have demonstrated strong performance in recent years. Yet their vulnerability is largely underexplored. We investigate advanced backdoor attack strategies on Vision-Language Models, focusing on image-to-text generation tasks. We demonstrate how backdoors can be embedded in complex multimodal tasks while maintaining semantic integrity under poisoned inputs. Additionally, we propose innovative techniques for injecting backdoors without requiring access to the original training data, expanding the feasibility of real-world attacks.

This proposal provides novel insights into the internal mechanisms of backdoored models, propose effective detection strategies, and develop advanced attack techniques that expose critical vulnerabilities. These findings underscore the urgent need for robust security measures to defend against emerging backdoor threats in deep learning models. The results have been published in top venues including ICLR, ECCV, NAACL, EMNLP, etc.

Speaker: Weimin Lyu


Zoom link: https://stonybrook.zoom.us/j/99880605139?pwd=cfWbRG6n9v3GXEa7OqvXa5cOp5eLBv.1
Meeting ID: 998 8060 5139
Passcode: 843302
Abstract: The capacity to adapt machine learning models to various contexts, information, and objectives is particularly valuable. In this thesis, I focus on developing Class Conditional Guided Models. These are models that can be adaptively biased towards a class of interest via a conditional input. My primary focus lies in the efficiency of these models. They are constructed to require training only once, with the ability to quickly and conveniently adapt during testing time without necessitating fine-tuning or retraining.
Firstly, I propose RelationVAE, a novel generative model designed for few-shot scenarios, utilizing the prior knowledge of class similarity relationships. RelationVAE is designed to condition on the embeddings of the neighbor classes (i.e. classes with similarity relationships), to generate more reliable samples by making them more similar to the neighbor class. This enables adaptation of the generative model to the provided prior knowledge about class relationships.
As a second focus, I introduce scGAN, a shadow segmentation technique that enables adaptation to varying shadow distributions in different testing environments. scGAN is designed to condition on a sensitivity parameter, a scalar, to control the amount of the shadow detected. In the testing phase, the parameter is set to appropriate values, allowing the model to quickly adapt to specific test environments.
In my third contribution, I propose S-SEG, a methodology for fine-grained counting allowing adaptation to different granularities of fine-grained classes. In fine-grained problems, the distinction between classes is subtle and inconsistent across images, leading to variations in the granularity of the target class from one image to another. S-SEG is designed to be conditioned on an additional input, the sensitivity parameter, to control the granularities of the target class during inference.
My fourth contribution is a text-to-image synthesis method which allows controlling the number of the generated objects of a target class. I propose to generate an intermediate condition, the density map, which reflects the number of objects, together with their layout. This intermediate condition is used to effectively guide the generative model to generate objects with accurate counts.

Speaker: Vu Nguyen

Zoom: https://stonybrook.zoom.us/j/97114455337?pwd=Z4rB9dWcstlahUIs8PRrvQ9b2ZK2Df.1
Meeting ID: 971 1445 5337
Passcode: 272300
Virtual Talk: Metadata Matters: Robust Document Classification via Adaptation Methods for Text-driven Public Health by Xiaolei Huang

Zoom link to follow.

Abstract: Document classifiers have been widely applied in solving health-related issues, such as suicide prevention, flu vaccination surveillance and disease diagnosis. However, document metadata including time, gender, age and location has an enormous impact on robustness of 
document classifiers. Language varies across the metadata bringing both challenges and opportunities to build reliable document classifiers. For example, online written language changes over time, and males and females express opinions differently. This talk describes how to use domain adaptation to integrate temporal and user demographic factors into document classifiers. By adapting knowledge of how language varies across the metadata, models can learn generalized representations of language through the metadata-invariant embeddings. 
This approach will lead to metadata-adapted document classifiers and can also extend to personalize classification models by user embedding. 

Bio: Xiaolei Huang is a 4th-year PhD candidate in Information Science at the University of Colorado, Boulder. He is currently a visiting scholar at the Johns Hopkins University. His research interests are in Natural Language Processing, Machine Learning and Public Health. Particularly, he focuses on domain adaptation, cross-lingual transfer learning, user modeling and fairness.

Ready for Round Two? Dr. Zach Justus Returns! Join us on October 30, 2025, in the SBU Hilton Garden Inn. Buckle up your curiosity for a high-energy morning session with the engaging Dr. Zach Justus as we navigate how GenAI is reshaping not just how we teach, but what we teach. With real talk and questions that hit hard like Are students learning what we think we're teaching? This is your chance to rethink your program's true destination. Whether you're looking to pick up a few takeaways or chart a new direction entirely, this symposium is your space to explore, reflect, and act.

Check-in and breakfast will begin at 8:30 a.m. in order to begin our program promptly at 9:00 a.m.

Registration will remain open until October 15 or until the event reaches capacity. If closed, please contact educationaleffectiveness@stonybrook.edu to request a spot on the waitlist.

What can you learn from over seven years' worth of Twitter bios? Steven Skiena, Distinguished Teaching Professor of Computer Science and Director of SBU's Institute for AI-Driven Discovery and Innovation, will tell us.

Presenting work done with collaborators Jason Jones, Dakota Handzlik, and Xingzhi Guo, Dr. Skiena will discuss what the team learned about how people portray themselves on social media through their political identities and job status. He'll also show us what you can predict about a person based on their self-description.

If you have a disability and are requesting accommodations in order to fully participate in this event, please email libraryevents@stonybrook.edu or call 631-632-7100.

Register now: https://library.stonybrook.edu/library-events/stem-speaker-series-measuring-self-identity/

Recently, large-scale language data combined with modern machine learning techniques have shown strong value as means for studying human psychology and behavior. For example, language alone has been shown predictive in mental health, personality, and health behaviors. However, many applications for such language-based assessments have readily available and important data beyond language (i.e. extra-linguistics), such as predicting the subjective well-being of a community using tweets, where one can take into account their age, education, and demographic attributes. Language may capture some characteristics while extra-linguistic variables captures others. We believe that effectively integrating linguistic and extra-linguistic data can yield benefits beyond either independently. In this thesis, we develop methods which effectively integrate extra-linguistic data with language data focused primarily on social scientific applications. The central challenge is dealing with the size and heterogeneity of, often sparse and noisy, language data versus the, often low-dimensional and non-sparse, extra-linguistic variables. First, we consider structured extra-linguistics, like socioeconomic (income and education rates) and demographics (age, gender, etc.), and propose two integration methods, named residualized controls (RC) and residualized factor adaptation (RFA), to be used in county-wise prediction tasks. Demonstrating techniques that integrate information at both the model-level and data-level, we found consistently strong improvement over naively combining features, for example, increasing county level well-being predictions by over 12%. Next, we consider unstructured extra-linguistic data. In the first part, we incorporate social network connections and language over time to propose a novel metric for quantifying the stickiness of words - their ability to spread across friendship connections in a social network over time (or in other words, stick in ones vocabulary after seeing friends use it). We obtain which language features are more probable to disseminate through friendship and show such a metric is useful for predicting who will be friends and what content will spread. In addition, we analyze language content over time by proposing a novel dynamic content-specific topic modeling technique that can help to identify different sub-domains of a thematic scope and can be used to track societal shifts in concerns or views over time.


Abstract: In high-dimensional data spaces, vast empty regions often exist where no known data points are present. These empty spaces are not merely gaps but hold untapped potential for discovering novel configurations, optimizing parameters, and improving decision-making processes. However, traditional exploration techniques struggle to identify and leverage these regions due to the curse of dimensionality. To address this, we introduce the Empty Space Search Algorithm (ESA), a scalable, physics-inspired method that systematically identifies and explores these uncharted voids. ESA operates by modeling the data space as a dynamic system, using a repulsion-attraction mechanism to locate optimal empty space configurations (ESCs) without requiring exhaustive search. Building upon ESA, we present GapMiner, a visual analytics system that integrates human-in-the-loop AI to iteratively refine and validate ESCs. GapMiner combines parallel coordinate visualization, interactive optimization, and deep learning-based predictive modeling to enhance the efficiency of empty space exploration. This methodology has broad applications, including accelerating convergence in evolutionary algorithms through a more diverse initial population, optimizing adversarial learning strategies, and discovering novel parameter configurations in reinforcement learning. Our approach demonstrates that empty space is not just an absence of data but a frontier for new possibilities in high-dimensional problem-solving.
Bio: Xinyu Zhang received his B.E. in Computer Science from Shandong University, Taishan College, in 2019. He is currently a final-year Ph.D. candidate in the Department of Computer Science at Stony Brook University, advised by Prof. Klaus Mueller. His research focuses on multivariate data analysis, scientific visualization, and reinforcement learning. He has published multiple papers in top-tier journals and conferences, including IEEE TVCG and NeurIPS.
*this seminar will be held in person (food provided on a first come, first serve basis), and online (zoom link below)!
Topic: IACS Student Seminar Speaker: Xinyu Zhang
Time: Feb 26, 2025 12:00 PM Eastern Time (US and Canada)
Join Zoom Meeting
https://stonybrook.zoom.us/j/91848218975?pwd=lfITFa61GaXZ2Wsa1B1OnbLQMmXvOE.1

Meeting ID: 918 4821 8975
Passcode: 027337