Data-efficient methods for dialogue systems
Abstract
Conversational User Interface (CUI) has become ubiquitous in everyday life, in consumer-focused products like Siri and Alexa or more business-oriented customer support automation
solutions. Deep learning underlies many recent breakthroughs in dialogue systems but requires
very large amounts of training data, often annotated by experts — and this dramatically increases the cost of deploying such systems in production setups and reduces their flexibility as
software products. Trained with smaller data, these methods end up severely lacking robustness
to various phenomena of spoken language (e.g. disfluencies), out-of-domain input, and often
just have too little generalisation power to other tasks and domains.
In this thesis, we address the above issues by introducing a series of methods for bootstrapping
robust dialogue systems from minimal data. Firstly, we study two orthogonal approaches to dialogue: a linguistically informed model (DyLan) and a machine learning-based one (MemN2N) —
from the data efficiency perspective, i.e. their potential to generalise from minimal data and
robustness to natural spontaneous input. We outline the steps to obtain data-efficient solutions
with either approach and proceed with the neural models for the rest of the thesis.
We then introduce the core contributions of this thesis, two data-efficient models for dialogue
response generation: the Dialogue Knowledge Transfer Network (DiKTNet) based on transferable latent dialogue representations, and the Generative-Retrieval Transformer (GRTr) combining response generation logic with a retrieval mechanism as the fallback. GRTr ranked first at
the Dialog System Technology Challenge 8 Fast Domain Adaptation task.
Next, we the problem of training robust neural models from minimal data. As such, we look at
robustness to disfluencies and propose a multitask LSTM-based model for domain-general disfluency detection. We then go on to explore robustness to anomalous, or out-of-domain (OOD)
input. We address this problem by (1) presenting Turn Dropout, a data-augmentation technique
facilitating training for anomalous input only using in-domain data, and (2) introducing VHCN
and AE-HCN, autoencoder-augmented models for efficient training with turn dropout based on
the Hybrid Code Networks (HCN) model family.
With all the above work addressing goal-oriented dialogue, our final contribution in this thesis
focuses on social dialogue where the main objective is maintaining natural, coherent, and engaging conversation for as long as possible. We introduce a neural model for response ranking
in social conversation used in Alana, the 3rd place winner in the Amazon Alexa Prize 2017 and
2018. For our model, we employ a novel technique of predicting the dialogue length as the main
objective for ranking. We show that this approach matches the performance of its counterpart
based on the conventional, human rating-based objective — and surpasses it given more raw
dialogue transcripts, thus reducing the dependence on costly and cumbersome dialogue annotations.