Abstract:

In recent years, the landscape of artificial intelligence (AI) has been reshaped by the rapid emergence of Foundation Models (FMs). These versatile models have garnered widespread attention for their remarkable ability to transcend the boundaries of traditional, bespoke AI solutions and to generalize to a large set of downstream tasks. In this presentation we will describe the development of geospatial FMs with earth observation and weather data and discuss initial results of such models. We will also show how such foundation models can be a new and exciting tool for assisting with and accelerating scientific discovery.

Speaker:

Hendrik Hamann
Distinguished Researcher
IBM T.J. Watson Research Center


To truly understand human language, we must look at words in the context of the human generating the language. Factors such as demographics, personality, modes of communication, and emotional states have shown to play a crucial role in NLP models pre-LLMs era. Steps of mathematically defining the inclusion of human context in language modeling and more will be discussed with Nikita Soni, a PhD student at Stony Brook University co-advised by H. Andrew Schwartz and Niranjan Balasubramanian. She is the lead organizer of the workshop on human-centered large language modeling.

Please register for the STEM Speaker Series Zoom event here

Please RSVP for the STEM Speaker Series in-person event here


Abstract:
Deep learning models have achieved remarkable success across a wide range of computer vision tasks, including image classification, semantic segmentation, etc. However, such success highly relies on a large amount of annotated data, which are expensive to obtain. Moreover, their performance often degrades when there exist distribution shifts between training and test data. Domain Adaptation overcomes these issues by transferring knowledge from a label-rich source domain to a related but different target domain. Despite its popularity, domain adaptation is still a challenging task, especially when the data distribution shifts are severe, while the target domain has no or few labeled data.

In this thesis, I develop four efficient domain adaptation approaches to improve model performance on the target domain. Firstly, inspired by the large-scale pretraining of Vision Transformers, I explore Transformer-based domain adaptation for stronger feature representation and design a safe training mechanism to avoid model collapse in the situation of a large domain gap. Secondly, I observe that source models have low confidences on the target data. To address this, I focus on the penultimate activations of target data and propose an adversarial training strategy to enhance model prediction confidences. Thirdly, I study using weak supervision from prior knowledge about target domain label distribution. A novel Knowledge-guided Unsupervised Domain Adaptation paradigm is devised, and a plug-in module is designed to rectify pseudo labels. Lastly, I step into the task of Active Domain Adaptation, where the labels of a small portion of target data can be inquired. I propose a novel active selection criterion based on the local context and devise a progressive augmentation module to better utilize queried target data. The robustness of domain adaptation approaches, in addition to accuracy, is critical yet under-explored. To conclude the thesis, I empirically study set prediction in domain adaptation using the tool of conformal prediction and conformal training.


Location: New Computer Science Bldg., Room 120
Zoom Link: https://stonybrook.zoom.us/j/92736258273?pwd=ipDdh1CTG6dRYmqa3ltUvooei8OfaT.1Meeting ID: 927 3625 8273
Passcode: 466399

Abstract: Self-supervised representation learning (SRL) has emerged as a pivotal advancement in machine learning, offering high-quality data representations without the need for labeled datasets. While SRL has demonstrated enhanced adversarial robustness compared to supervised learning, its resilience against other attack types, particularly backdoor attacks, remains an open question. Recent studies have revealed potential vulnerabilities in SRL, underscoring the necessity for a comprehensive security analysis. However, existing research often extrapolates attacks from supervised learning paradigms, neglecting the unique challenges and opportunities inherent to self-supervised mechanisms.

This thesis proposal aims to address three critical objectives in the realm of self-supervised learning: (1) exploring novel attack vectors, (2) implementing and evaluating practical attacks, and (3) developing robust countermeasures. We focus on two key SRL paradigms: Contrastive Learning and Diffusion Models. For Contrastive Learning, we synthesize existing security vulnerabilities and introduce innovative attack vectors, such as CTRL, to uncover distinctive risks. We conduct a comparative analysis of contrastive and supervised learning approaches in their defense against these threats, exploring potential safeguards and highlighting the limitations of current protective measures in self-supervised contexts. Regarding Diffusion Models, we demonstrate inherent vulnerabilities in their application to adversarial purification.

Our research aims to illuminate the unique challenges posed by emerging attack vectors in self-supervised learning, fostering technical advancements to address underlying security risks in real-world applications. By contributing to the development of more resilient and secure self-supervised representation learning systems, we seek to enhance their reliability and trustworthiness in practical scenarios. This comprehensive examination of SRL's security landscape will provide valuable insights for the broader machine-learning community and pave the way for more robust AI systems.

Join here.

You are cordially invited to attend the biweekly Brookhaven AI Mixer (BAM). BAM includes three short talks on AI research happening at BNL, followed by an open mixer over coffee and snacks for everyone to network and discuss all things AI. The first half hour will consist of presentations that will be available via ZOOM, and the second half hour will be for in person only networking.

Join us every other Tuesday at noon in CDSD's Training Room (building 725, 2nd floor) to learn about interesting AI methods and applications, engage with potential collaborators, prepare for pending FASST funding calls, and build a community of AI for Science at BNL.

Tuesday, November 12, 2024, 12:00 pm -- CDS, Bldg. 725, Training Room

Speakers

Carlos Soto, CDS

Yi Huang, CDS

Kevin Yager, CFN

The Stony Brook Computing Society presents an exciting event featuring experts from Google (Danny Rosen - Technical Program Manager) and NVIDIA (Veer Mehta - Senior Solutions Architect), diving into the latest developments in generative AI. Learn how these industry leaders are shaping the future of technology and explore new ideas in a relaxed, engaging setting.

📍 Location: Frey 102
📅 Date: Monday, Nov 11
⏰ Time: 12 PM - 1:50 PM

Scan the QR code or register in the link.


The Provost's Office is excited to invite you to join in responding to an extraordinary opportunity to enhance our academic and research capabilities in AI at Stony Brook. SUNY recently made funding available to support the creation of departments of AI and Society at its universities. Stony Brook is well-positioned to seize this opportunity to build upon our interdisciplinary strengths in AI.

The office is hosting a forum on Friday, Nov. 15, from 11:30 a.m. to 1:30 p.m., in Ballroom A, SAC. You are invited to attend to learn more about this opportunity and to help us generate ideas to build a compelling proposal for Stony Brook to submit to SUNY. Lunch will be provided.

Please click here to RSVP as soon as possible.

This funding will support innovation in our curriculum, allowing us to create programs that explore the social and societal impact of AI alongside the technological advancements led by researchers in engineering and scientific disciplines.

We believe we can make a significant impact through this SUNY program and look forward to your participation in this initiative.

Join Stony Brook University's Center for Excellence in Learning and Teaching (CELT) for a bootcamp on how to use AI to enhance your teaching and courses. This event will demonstrate how ChatGPT, Microsoft Copilot, and other generative AI platforms can support you in crafting learning objectives, writing exam questions, composing rubrics, and designing course content such as lesson plans, in-class activities, instructional videos, and more.

Register here.


Get hands-on with data cleaning techniques using Python and AI tools. Join SBU Libraries' Data Literacies Lead, Ahmad Pratama, to learn how to identify and rectify errors, handle missing data, and prepare your dataset for analysis. This workshop introduces you to powerful yet easy-to-use tools and techniques that make data cleaning efficient and effective, turning chaotic data into valuable insights.

Please register for the Data Cleaning with Python and AI here.

AI can help you write, you hear. AI can save you time, leverage your skills, enhance your productivity. . . . But you also hear: AI output is not reliable, not adequate for advanced tasks/learning, not ethical to use -- you could get in deep trouble for using AI tools without adequate mastery and caution. Which way is it?
Come join this hands-on workshop where you will explore AI tools and their affordances. Engage in writing tasks to learn how to use AI tools effectively and responsibly.
Sign up for a seat now: https://docs.google.com/forms/d/e/1FAIpQLSd0iDTKkTYnkxFd4LkgqbtP97zQSS4FI_MiPVm7p6IY5SGwSg/viewform