Carnegie Mellon University
Browse

Towards Generalization in Dialog through Inductive Biases

Download (1.87 MB)
thesis
posted on 2025-03-18, 20:02 authored by Shikib Mehri

Generalization is imperative in dialog research. Data-driven models have been shown to be capable of performing specific tasks in constrained contexts, given ample data. However, the complexity of human communication necessitates that models of dialog be capable of generalizing beyond the limitations of any finite corpus. Models of dialog must be able to generalize to unseen and unforeseen phenomena. This thesis studies four classes of generalization: (1) generalization to new inputs, (2) generalization to new problems, (3) generalization to new outputs and (4) generalization to new dialog tasks. Inductive biases are studied in order to facilitate these four classes of generalization. An inductive bias is motivated by prior knowledge (e.g., domain knowledge, knowledge of the desired generalizations) and aims to influence the abstractions learned by a model in order to induce generalization. Four categories of inductive biases are studied: (1) through self-supervised learning, (2) inductive biases in the model architecture, (3) inductive biases in the problem formulation and (4) the task specification as an inductive bias.

This thesis consists of four chapters, each corresponding to one class of inductive bias. Chapter 3 studies self-supervised learning as an inductive bias and validates the use of the self-supervised training data and the self-supervised objectives as a mechanism for facilitating generalization to new inputs, new problems and new outputs. Chapter 4 incorporates inductive biases into the model architecture thereby prescribing a specific procedure by which the model infers the output from a given input. Through inductive biases in the model architecture, models are shown to generalize to new inputs/domains and to new outputs. Chapter 5 studies inductive biases in the problem formulation, wherein a problem is reformulated to better align with the capabilities of a pre-trained model. Both dialog evaluation and slot filling are reformulated to the task of response generation, which facilitates zero-shot generalization to new inputs and new outputs. Chapter 6 explores the most challenging class of generalization in dialog: generalization to new tasks. To transfer to unseen tasks (e.g., restaurant reservations) in a zero-shot setting, the task specification is used as an inductive bias. The task specification is a minimal expression of the task-specific properties that must be learned by a data-driven model. This thesis studies several mechanisms for using the task specification as an inductive bias: as an input, during the creation of synthetic data and as part of the model architecture. Leveraging the task specification as an inductive bias results in significant performance gains in zero-shot generalization to unseen tasks.

History

Date

2022-08-18

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Maxine Eskenazi

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC