Researchers have long recognised that for artificial intelligence to truly collaborate with people, it must accurately anticipate human intentions. Peter Zeng, Weiling Li, and Amie Paige, from Stony Brook University, alongside Zhengxiang Wang, Panagiotis Kaliosis, Dimitris Samaras et al, investigated how Large Visual Language Models (LVLMs) establish ‘common ground’ during communication , a fundamental aspect of human interaction. Their new study, detailed in a referential communication experiment, reveals a significant limitation in LVLMs’ ability to interactively resolve ambiguous references, using a unique dataset of 356 human and machine dialogues. This work is crucial because it highlights a key deficit preventing seamless human-AI partnership and provides valuable resources , including data and analysis tools , for improving the modelling of shared understanding in AI systems.
Lvlms Struggle with 356 Dialogues, Revealing Limits in Human-Machine Communication