Researchers have long recognised that for artificial intelligence to truly collaborate with people, it must accurately anticipate human intentions. Peter Zeng, Weiling Li, and Amie Paige, from Stony Brook University, alongside Zhengxiang Wang, Panagiotis Kaliosis, Dimitris Samaras et al, investigated how Large Visual Language Models (LVLMs) establish ‘common ground’ during communication , a fundamental aspect of human interaction. Their new study, detailed in a referential communication experiment, reveals a significant limitation in LVLMs’ ability to interactively resolve ambiguous references, using a unique dataset of 356 human and machine dialogues.