Pragmatic Language Understanding and Information Integrity

Event Description

Abstract: In today's digital era, language functions not only as a medium of information transmission but also as a mechanism of persuasion, framing, and control. The proliferation of online platforms has amplified this dual role: while enabling unprecedented access to knowledge, it has also exacerbated challenges such as misinformation, rhetorical manipulation, and cultural or linguistic disparities in information access. As a result, pragmatic language understanding and information integrity have emerged as central concerns for both computational linguistics and society at large. This research follows how claims are produced, reframed, and contested online through three interconnected threads. First, it models pragmatic deflection in discourse by investigating whataboutism, a rhetorical device that deflects criticism by redirecting discourse, and introduced novel datasets from Twitter (now X) and YouTube. This work underscores how subtle pragmatic maneuvers can erode discourse integrity without relying on outright falsehoods. Second, it advances retrieval and alignment for information integrity in health and news communication. These systems trace claims and narratives across genres (e.g., social posts and news reports) and languages (Chinese and English), linking social posts with journalistic reporting and aligning Chinese news with English biomedical evidence. By accounting for cultural context, assertions can be linked to reliable evidence and organized for systematic comparison. This work surfaces the risks of missing sources, unverifiable claims, and framing disparities in global health discourse, and demonstrates computational solutions that enhance both the credibility and accessibility of information. Third, the methodological centerpiece is Class Distillation (ClaD), a geometry-aware training paradigm for distilling a small, well-defined target class from a large, heterogeneous background. ClaD couples a distribution-aware contrastive loss (instantiated here in a Mahalanobis form when its assumptions fit the data) with an interpretable decision algorithm tuned for class separation. Evaluated on sarcasm, metaphor, and sexism detection, ClaD delivers strong efficiency and robustness, matching or surpassing larger models while using fewer computational resources, making these pipelines practical by learning reliably from small, sharply defined classes. In sum, this research presents an integrated account of language understanding in the digital age. It exposes how integrity falters through pragmatic deflection, cross-genre drift, and cross-lingual misalignment, and translates these insights to move pragmatic language understanding to systems for evidence retrieval, alignment, and verification; and it sheds light on where and how integrity is threatened, and delivers methods that leverage pragmatic language use.

Speaker: Chenlu Wang

Location: (Old) Computer Science Building, Room 2311

Date Start

Date End