Learning to Converse With Latent Actions
The study of actions has been on the frontier of dialog research since day one. Rooted in speech act theory (Austin, 1962), actions represent the basic communication unit and define the types of interactions that a dialog agent is capable of. This dissertation begins with the goal of developing domain-agnostic dialog models that can learn to converse with induced action representations. Achieving this first requires the models to be expressive and general purpose so that we can create dialog agents in many different domains via the same framework. It then requires the model to produce semantic representations that encode actions in natural conversations and fulfill the requirements of real-world dialog systems. Unfortunately, current methodologies to create dialog systems are not adequate to achieve this goal. The classical frame-based dialog pipeline have assumed the actions are pre-defined by expert handcrafting, which struggle to generalize to complex domains. More recent end-to-end (E2E) dialog models based on encoderdecoder neural networks are designed to be not restricted by hand-crafted semantic representations. Unfortunately, it is far from trivial to build a fullfledged dialog system using encoder-decoder models and they suffer from a range of limitations. Moreover, current E2E models only focus on the final response word outputs and pays little attention to the action representation.
This dissertation advocates a new family of E2E dialog models based on latent actions. Latent actions model the hidden actions in raw conversations as latent variables and make it possible to learn explicit action representations at scale. Concretely, a general latent action framework is defined and detailed, including desired properties, optimization techniques, and we developed novel solutions to efficiently discover latent actions from large datasets and seamlessly integrate the resulting latent actions into E2E neural dialog models. Then four different types of latent action are created to address major limitations that current E2E dialog systems are facing: (1) the dull response problem where models tend to generate generic responses, (2) poor interpretability where E2E models cannot be easily interpreted (3) limited domain generalization where deep models requires a lot of in-domain training data and (4) strategy optimization where is it challenging to apply reinforcement learning for E2E models. This work shows that the abovementioned challenges can naturally be solved by using latent actions and significant empirical performance gain can be observed. The proposed framework also offers a new perspective to create E2E dialog models that focus on action representation, which enables new research that connects to other subjects, e.g., sentence representation learning, zero-shot learning etc. This research is a first step towards bridging the classic dialog action research to neural E2E models, and lays the foundation for building dialog systems that can accomplish more complex tasks, understand and reason as human do.
- Language Technologies Institute
- Doctor of Philosophy (PhD)