Stony Brook University Presents Eight Papers at the 2023 EMNLP Conference

Empirical Methods in Natural Language Processing, or EMNLP, is a leading conference in the fields of Artificial Intelligence and Natural Language Processing. Organized by the ACL Special Interest Group on Linguistic Data (SIGDAT) and started in 1996, EMNLP was recently recognized as the 2nd most mentioned in Natural Language Processing.

This year's EMNLP conference, which will be held from Dec 6 to Dec 10 in Singapore, received thousands of submissions from several internationally renowned universities and organizations, but only 21.3% of the papers were selected for the main conference. Among other contributions, the conference recognized the work of AI stalwarts at Stony Brook University, accepting eight papers submitted by researchers from the institute.

1. SAGEViz: SchemA GEneration and Visualization
Authors: Sugam Devare, Mahnaz Koupaee, Gautham Gunapati, Sayontan Ghosh, Sai Vallurupalli, Yash Kumar Lal, Francis Ferraro, Nathanael Chambers, Greg Durrett, Raymond Mooney, Katrin Erk and Niranjan Balasubramanian

Event schemas are central to understanding and reasoning about events; they help organize and represent how a complex event might unfold (for example, when a disease outbreak happens, it is likely that an investigation will be launched, and that mitigation steps will follow). But manually creating these schemas is a time and resource-consuming process. This paper proposes SAGEViz, a human-in-the loop approach that utilizes automation, visuals, and a plug-and-play model to produce expert-verified schemas across various levels of an event’s hierarchy.

2. Attention-Enhancing Backdoor Attacks Against BERT-based Models
Authors: Weimin Lyu, Songzhu Zheng, Lu Pang, Haibin Ling, Chao Chen

This paper explores Backdoor Attacks in NLP models and shares a novel method—Trojan Attention Loss (TAL)—to help improve their attacking efficiency. This new method, an attention-enhancing loss function, is the first to enhance backdoor behavior by directly manipulating the attention patterns, showing how poisoning only 1% of training data can already achieve a satisfying attack success rate (ASR).

3. Knowledge Graph Compression Enhances Diverse Commonsense Generation
Authors: EunJeong Hwang, Veronika Thost, Vered Shwartz, Tengfei Ma

Commonsense knowledge graphs like ConceptNet, which contain a range of graphs regarding general wisdom and facts, can help AI models generate better common-sense explanations. However, these graphs cover several topics that may not even belong together. This paper applies a technique called differentiable graph compression to pick parts of these graphs that are more relevant for a specific task and then inject those graph concepts into AI language models that generate text, leading to longer, more diverse, and higher quality outputs when generating explanations that involve commonsense reasoning.

4. CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code
Authors: Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, Wenhai Wang

Most commercial off-the-shelf software is closed-source and typically distributed as stripped binaries that lack a symbol table or debug information for easy distribution, copyright protection, and hindering malicious evasion; even an experienced reverse engineer needs to spend a significant amount of time determining the functionality of an assembly code snippet. This paper presents a novel control flow graph and pseudo code-guided binary code summarization framework called CP-BCS, demonstrating that it is superior and that it significantly improves the efficiency of reverse engineering.

5. Finding Common Ground: Annotating and Predicting Common Ground in Spoken Conversations
Authors: Magdalena Markowska, Mohammad Taghizadeh, Adil Soubki, Seyed Mirroshandel, Owen Rambow

Communicating with others requires having a common ground; it requires knowing what we believe, and what our listener/s believe. While the concept of common ground has been widely studied across various disciplines, including linguistics and cognitive science, it has not yet been studied in Natural Language Processing. This paper explores a subset of the concept, presenting a new corpus and discussing baseline experiments for predicting events, beliefs about events, and achieving common ground.

6. NORMSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly
Authors: Yi Fung, Tuhin Chakrabarty, Hao Guo, Owen Rambow, Smaranda Muresan, Heng Ji

Discovering the norm is important for understanding and reasoning about acceptable behaviors and potential violations in human communication and interactions. This paper introduces NORMSAGE1 , a framework that leverages the expressiveness and implicit knowledge of the pre-trained GPT-3 language model backbone, to address the novel task of conversation-grounded multi-lingual, multi-cultural norm discovery, based on language model prompting and self-verification.

7. GNAT: A General Narrative Alignment Tool
Authors: Tanzir Pial, Steven Skiena

The alignment of algorithmic sequences can identify similar segments shared between pairs of documents. However, it’s challenging to recognize these segments between distant versions of narratives, such as translations and retellings, particularly for summaries and abridgments that are much shorter than the original novels. This paper presents new methods to identify narrative alignment, applying and evaluating their General Narrative Alignment Tool (GNAT) in summary-to-book alignment, translated book alignment, short story alignment, and plagiarism detection—demonstrating the power and performance of their work.

8. Analyzing Film Adaptation through Narrative Alignment
Authors: Tanzir Pial, Shahreen Aunti, Charuta Pethe, Allen Kim, Steven Skiena

Novels are often adapted into feature films, but the differences between the two require dropping sections of the text from the movie script. The paper studies this screen adaptation process by constructing narrative alignments and using them to perform an automated analysis of 40 adaptations, revealing insights into the screenwriting process concerning the faithfulness of adaptation, the importance of dialogue, preservation of narrative order, and gender representation issues reflective of the Bechdel test.

Varying across core NLP research, automation, and applications in communication, writing, and film, SBU’s contributions in Artificial Intelligence are novel and industry-leading.

“We’re very proud to be able to share work with the rest of the NLP community,” commented AI Institute Director Steven Skiena, “Having SBU represented at EMNLP makes us want to work harder toward our mission of expanding the uses of AI at Stony Brook and beyond.”

Ankita Nagpal
Communications Assistant