On the Systemic Insecurity of Large Language Model Systems: Vulnerabilities in Content, Retrieval, and Reasoning
Event Description
Abstract: Large Language Models (LLMs) have transitioned from standalone prediction interfaces into integrated systems that incorporate content protection, external knowledge retrieval, and multi-step reasoning. While these functional layers expand model capabilities, they also introduce complex, inter-component dependencies that create novel and systemic security risks. This research provides a systematic deconstruction of the structural vulnerabilities emerging across these functional layers.
In this proposal, we evaluate the security boundaries of LLM systems through three pivotal dimensions:
The Content Layer: We present Watermark under Fire, revealing the inherent fragility of content-based tracing mechanisms under adaptive perturbations and highlighting the limitations of surface-level safety measures.
The Retrieval Layer: We introduce GraphRAG under Fire to examine the security of topology-aware knowledge integration. We reveal how graph-based indexing can be exploited as a structural lever for high-success poisoning attacks.
The Reasoning Layer: We detail AutoRAN, the first framework demonstrating the hijacking of internal safety reasoning in Large Reasoning Models (LRMs). This work proves that the transparency of the reasoning process itself creates a critical and exploitable attack surface.
Collectively, these studies demonstrate a systemic failure of add-on safety mechanisms in securing the broader LLM ecosystem. By identifying recurring patterns of exploitation across different system layers, this research provides the necessary foundation for transitioning from reactive patching to a more unified and architecturally-grounded approach to AI trustworthiness.
Speaker: Jiacheng Liang
Zoom: https://stonybrook.zoom.us/j/ 6669990420?pwd= dkY0eEw5YXpPSWo3RUE4OE1oVW90UT 09&omn=97367037382
Meeting ID: 666 999 0420
Passcode: 075299
In this proposal, we evaluate the security boundaries of LLM systems through three pivotal dimensions:
The Content Layer: We present Watermark under Fire, revealing the inherent fragility of content-based tracing mechanisms under adaptive perturbations and highlighting the limitations of surface-level safety measures.
The Retrieval Layer: We introduce GraphRAG under Fire to examine the security of topology-aware knowledge integration. We reveal how graph-based indexing can be exploited as a structural lever for high-success poisoning attacks.
The Reasoning Layer: We detail AutoRAN, the first framework demonstrating the hijacking of internal safety reasoning in Large Reasoning Models (LRMs). This work proves that the transparency of the reasoning process itself creates a critical and exploitable attack surface.
Collectively, these studies demonstrate a systemic failure of add-on safety mechanisms in securing the broader LLM ecosystem. By identifying recurring patterns of exploitation across different system layers, this research provides the necessary foundation for transitioning from reactive patching to a more unified and architecturally-grounded approach to AI trustworthiness.
Speaker: Jiacheng Liang
Zoom: https://stonybrook.zoom.us/j/
Meeting ID: 666 999 0420
Passcode: 075299