Securing Language and Multimodal Models: Advances in Backdoor Learning and Efficient AI
Location
New Computer Science-2-Room 220 (50 Seats) (50)
Event Description
Abstract:
Recent advances in deep learning have significantly enhanced the capabilities of Natural Language Processing (NLP) and Vision-Language Models (VLMs). However, these advancements come with increased vulnerabilities, notably through backdoor attacks that pose severe security threats. This thesis addresses two critical dimensions of Trustworthy AI and Efficient Multimodal Representation Learning: (1) security through analyzing, detecting, and designing backdoor attacks in NLP and VLMs, and (2) efficiency through advanced multimodal representation methods tailored for clinical and medical imaging applications.
In the first dimension, we explore the internal mechanisms exploited by backdoor attacks, identifying the distinctive phenomenon of attention focus drifting in compromised transformer models, where trigger tokens consistently hijack attention. Leveraging these insights, we propose robust detection frameworks, including the attention-based Trojan detector (AttenTD) and a task-agnostic logit-based detection method (TABDet), achieving effective identification of backdoored NLP models across diverse tasks. We further introduce novel backdoor attack methodologies: the Trojan Attention Loss (TAL), enhancing attack efficiency and stealth through direct attention manipulation, and BadCLM, demonstrating critical vulnerabilities in clinical decision-support systems by effectively compromising clinical language models.
Extending our security exploration to multimodal settings, we investigate backdoor attacks on Vision-Language Models (VLMs), particularly in complex image-to-text generation tasks, proposing innovative techniques (TrojVLM, VLOOD) capable of embedding backdoors without direct access to original training data, thus showcasing practical risks in real-world scenarios.
In the second dimension, we address efficiency and interpretability challenges in clinical and pathology applications. We introduce TCP-LLaVA, the first multimodal large language model (MLLM) designed explicitly for Whole Slide Image (WSI) Visual Question Answering (VQA). Utilizing a novel token compression mechanism inspired by transformer-based models, TCP-LLaVA substantially reduces computational resource consumption while maintaining superior VQA performance across multiple tumor subtypes. Additionally, we present a multimodal transformer model integrating structured Electronic Health Records (EHR) with clinical notes, demonstrating enhanced predictive accuracy and interpretability for in-hospital mortality prediction through integrated gradient-based interpretability methods.
Together, these contributions present a comprehensive approach to ensuring AI models are not only secure against malicious manipulation but also efficient and interpretable for critical clinical applications, underscoring the essential need for trustworthy and effective AI systems.
Speaker: Weimin Lyu
Zoom: https://stonybrook.zoom.us/j/ 2392326575?pwd= SVQ2VkFXTnZZYmJUMXgvTXBuZWM3UT 09
Meeting ID: 239 232 6575
Passcode: 436192
Recent advances in deep learning have significantly enhanced the capabilities of Natural Language Processing (NLP) and Vision-Language Models (VLMs). However, these advancements come with increased vulnerabilities, notably through backdoor attacks that pose severe security threats. This thesis addresses two critical dimensions of Trustworthy AI and Efficient Multimodal Representation Learning: (1) security through analyzing, detecting, and designing backdoor attacks in NLP and VLMs, and (2) efficiency through advanced multimodal representation methods tailored for clinical and medical imaging applications.
In the first dimension, we explore the internal mechanisms exploited by backdoor attacks, identifying the distinctive phenomenon of attention focus drifting in compromised transformer models, where trigger tokens consistently hijack attention. Leveraging these insights, we propose robust detection frameworks, including the attention-based Trojan detector (AttenTD) and a task-agnostic logit-based detection method (TABDet), achieving effective identification of backdoored NLP models across diverse tasks. We further introduce novel backdoor attack methodologies: the Trojan Attention Loss (TAL), enhancing attack efficiency and stealth through direct attention manipulation, and BadCLM, demonstrating critical vulnerabilities in clinical decision-support systems by effectively compromising clinical language models.
Extending our security exploration to multimodal settings, we investigate backdoor attacks on Vision-Language Models (VLMs), particularly in complex image-to-text generation tasks, proposing innovative techniques (TrojVLM, VLOOD) capable of embedding backdoors without direct access to original training data, thus showcasing practical risks in real-world scenarios.
In the second dimension, we address efficiency and interpretability challenges in clinical and pathology applications. We introduce TCP-LLaVA, the first multimodal large language model (MLLM) designed explicitly for Whole Slide Image (WSI) Visual Question Answering (VQA). Utilizing a novel token compression mechanism inspired by transformer-based models, TCP-LLaVA substantially reduces computational resource consumption while maintaining superior VQA performance across multiple tumor subtypes. Additionally, we present a multimodal transformer model integrating structured Electronic Health Records (EHR) with clinical notes, demonstrating enhanced predictive accuracy and interpretability for in-hospital mortality prediction through integrated gradient-based interpretability methods.
Together, these contributions present a comprehensive approach to ensuring AI models are not only secure against malicious manipulation but also efficient and interpretable for critical clinical applications, underscoring the essential need for trustworthy and effective AI systems.
Speaker: Weimin Lyu
Zoom: https://stonybrook.zoom.us/j/
Meeting ID: 239 232 6575
Passcode: 436192