Carnegie Mellon University
Browse
- No file added yet -

Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems

Download (7.51 MB)
thesis
posted on 2022-12-13, 21:50 authored by Yun-Nung Chen

Various smart devices (smartphone, smart-TV, in-car navigating system, etc.) are incorporating spoken language interfaces, as known as spoken dialogue systems (SDS), to help users finish tasks more efficiently. The key role in a successful SDS is a spoken language understanding (SLU) component; in order to capture language variation from dialogue participants, the SLU component must create a mapping between natural language inputs and semantic representations that correspond to users’ intentions. 

The semantic representation must include “concepts” and a “structure”: concepts are domainspecific topics, and the structure describes relations between concepts and conveys intention. Most of knowledge-based approaches originated from the field of artificial intelligence (AI). These methods leveraged deep semantics and relied heavily on rules and symbolic interpretations, which mapped sentences into logical forms: a context-independent representation of a sentence covering its predicates and arguments. However, most prior work focused on learning a mapping between utterances and semantic representations, where such organized concepts still remain predefined. The need of predefined structures and annotated semantic concepts results in extremely high cost and poor scalability in system development. Thus, current technology usually limits conversational interactions to a few narrow predefined domains/topics. Because domains used in various devices are increasing, to fill the gap, this dissertation focuses on improving generalization and scalability of building SDSs with little human effort. 

In order to achieve the goal, two questions need to be addressed: 1) Given unlabeled conversations, how can a system automatically induce and organize the domain-specific concepts? 2) With the automatically acquired knowledge, how can a system understand user utterances and intents? To tackle above problems, we propose to acquire domain knowledge that captures human’s salient semantics, intents, and behaviors. Then based on the acquired knowledge, we build an SLU component to understand users. 

The dissertation focuses on several important aspects for above two problems: Ontology Induction, Structure Learning, Surface Form Derivation, Semantic Decoding, and Intent Prediction. To solve the first problem about automating knowledge learning, ontology induction extracts domain-specific concepts, and then structure learning infers a meaningful organization of these concepts for SDS design. With the structured ontology, surface form derivation learns natural language variation to enrich its understanding cues. For the second problem about how to effectively understand users based on the acquired knowledge, we propose to decode users’ semantics and to predict intents about follow-up behaviors through a matrix factorization model, which outperforms other SLU models. 

Furthermore, the dissertation investigates the performance of SLU modeling for humanhuman conversations, where two tasks are discussed: actionable item detection and iterative ontology refinement. For actionable item detection, human-machine conversations are utilized to learn intent embeddings through convolutional deep structured semantic models for estimating the probability of appearing actionable items in human-human dialogues. For iterative ontology refinement, ontology induction is first performed on human-human conversations and achieves similar performance as human-machine conversations. The integration of actionable item estimation and ontology induction induces an improved ontology for manual transcripts. Also, the oracle estimation shows the feasibility of iterative ontology refinement and the room for further improvement. 

In conclusion, the dissertation shows the feasibility of building a dialogue learning system that is able to understand how particular domains work based on unlabeled human-machine and human-human conversations. As a result, an initial SDS can be built automatically according to the learned knowledge, and its performance can be iteratively improved by interacting with users for practical usage, presenting a great potential for reducing human effort during SDS development. 

History

Date

2016-01-06

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Dr. Alexander I. Rudnicky, Dr. Anatole Gershman

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC