Learning to handle miscommunication in multi-modal conversational AI

Chiyah-Garcia, Javier

Learning to handle miscommunication in multi-modal conversational AI

dc.contributor.advisor	Suglia, Assistant Professor Alessandro
dc.contributor.advisor	Eshghi, Associate Professor Arash
dc.contributor.author	Chiyah-Garcia, Javier
dc.date.accessioned	2026-01-15T18:31:32Z
dc.date.issued	2025-04
dc.description.abstract	In human communication, we continuously negotiate shared understanding and deal with misunderstandings as they arise to achieve mutual coordination. However, despite the ubiquity and importance of misunderstandings and repairs in dialogue, conversational AI often struggles to process them effectively, limiting their ability to collaborate with humans through natural language. This thesis explores how to develop robust models for processing miscommunications in situated collaborative tasks. We first collect a dialogue corpus to study human-agent coordination in an ambiguous environment, finding that models struggle to resolve referring expressions. To address this shortcoming, we design and train models to ground referring expressions and detect ambiguities, learning strong multi-modal representations in situated dialogues. We then analyse the signals required for models to learn to handle miscommunications, and propose a cross-modal taxonomy of clarifications to assess the contribution of distinct modalities. Our experiments with different model architectures and training objectives reveal that secondary objectives are essential to integrate multiple modalities (dialogue, visual and relational), leading to models better suited to deal with challenging clarifications in conversations. Finally, we evaluate how generative multi-modal LLMs handle both miscommunications and repairs by releasing a new benchmark, BlockWorld-Repairs, based on human data collections and studies. We then propose alternative training approaches that encourage models to learn from interactive settings, generalising to handling both instructions and subsequent repairs for successful task-completion. Throughout this thesis, we highlight the challenges posed by miscommunications and present approaches to develop robust collaborative conversational AI models better adapted for human interactions.	en
dc.identifier.uri	https://www.ros.hw.ac.uk/handle/10399/5238
dc.language.iso	en	en
dc.publisher	Heriot-Watt University	en
dc.publisher	Mathematical and Computer Sciences	en
dc.title	Learning to handle miscommunication in multi-modal conversational AI	en
dc.type	Thesis	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Chiyah-GarciaJ_0425_macsSS.pdf
Size:: 25.08 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Doctoral Theses (Mathematical & Computer Sciences)