Communication Techniques in Large Language Model Argumentation and Reasoning
Event Description
Abstract: Recent work in NLP uses debates between multiple LLMs to arrive at a more accurate conclusion. Earlier chain-of-thought prompting also shows improvements in accuracy when the model is asked to provide step-by-step reasoning in its response. Many publications since have developed strategies to improve the reasoning of model output with the goal of generating a more accurate result. However, even when asked to provide problem solving steps, the content of the reasoning provided by models is not well studied for all tasks and sometimes contains errors or conflicting statements even when the final result is correct. In fact, when evaluated across reasoning tasks, evidence shows that LLMs are not learning how to reason but are instead mimicking relevant solutions from their training sets.
By studying and evaluating the argumentation that LLMs provide, we can determine factors that may benefit or hinder the model's ability to give a complete, cohesive, and thorough answer. While there are signs that LLMs pattern match, finding where, when, and why this fails is valuable, as there may be ways to help the model imitate solutions that are more relevant to the task it is attempting to solve. Determining when pattern matching is not enough could show an area of improvement for future generations of LLMs. This research may separately aid in work on human-(AI)agent and inter-agent interaction. Specifically, frameworks could be used to determine when and why other models or humans are convinced by LLM-generated responses and which argument methods cause other models to change their response. Our current research in systematic versus heuristic cues shows that large language models sometimes present systematic or heuristic reasoning patterns based on prompting. Future research aims to explore other methods of classifying argumentation.
Speaker: Kiera Gross
Joining link: https://meet.google.com/xae-ywpv-udo
By studying and evaluating the argumentation that LLMs provide, we can determine factors that may benefit or hinder the model's ability to give a complete, cohesive, and thorough answer. While there are signs that LLMs pattern match, finding where, when, and why this fails is valuable, as there may be ways to help the model imitate solutions that are more relevant to the task it is attempting to solve. Determining when pattern matching is not enough could show an area of improvement for future generations of LLMs. This research may separately aid in work on human-(AI)agent and inter-agent interaction. Specifically, frameworks could be used to determine when and why other models or humans are convinced by LLM-generated responses and which argument methods cause other models to change their response. Our current research in systematic versus heuristic cues shows that large language models sometimes present systematic or heuristic reasoning patterns based on prompting. Future research aims to explore other methods of classifying argumentation.
Speaker: Kiera Gross
Joining link: https://meet.google.com/xae-ywpv-udo