Situated grounding and understanding of structured low-resource expert data
Katsakioris, Miltiadis Marios
MetadataShow full item record
Conversational agents are becoming more widespread, varying from social to goaloriented to multi-modal dialogue systems. However, for systems with both visual and spatial requirements, such as situated robot planning, developing accurate goaloriented dialogue systems can be extremely challenging, especially in dynamic environments, such as underwater or first responders. Furthermore, training data-driven algorithms in these domains is challenging due to the esoteric nature of the interaction, which requires expert input. We derive solutions for creating a collaborative multi-modal conversational agent for setting high-level mission goals. We experiment with state-of-the-art deep learning models and techniques and create a new data-driven method (MAPERT) that is capable of processing language instructions by grounding the necessary elements using various types of input data (vision from a map, text and other metadata). The results show that, depending on the task, the accuracy of data-driven systems can vary dramatically depending on the type of metadata and the attention mechanisms that are used. Finally, we are dealing with low-resource expert data and this inspired the use of the Continual Learning and Human In The Loop methodology with encouraging results.