Carnegie Mellon University
Browse
yin, pengcheng - Thesis.pdf (4.68 MB)

Learning Structured Neural Semantic Parsers

Download (4.68 MB)
thesis
posted on 2023-08-22, 21:02 authored by Pengcheng YinPengcheng Yin

Semantic parsing, the task of translating user-issued natural language (NL) utterances (e.g., Flights from Pittsburgh to New York) into formal meaning representations (MRs, e.g., an SQL database query or a Python program), has become an important direction in developing natural language interfaces to computational systems. Recent years have witnessed the burgeoning of applying neural network-based semantic parsers in various tasks and domains. However, meaning representations typically exhibit strong syntactic structure, and are defined following domain-specific structured knowledge schemas (e.g., a database schema or Python API specification), which is not easily captured by standard neural sequence transduction models. Neural semantic parsers are also data-hungry, requiring non-trivial manual annotation effort by domain experts. These issues limit the scope of applications supported by a neural semantic parser, impeding the progress of applying the system to broader scenarios, especially those with diverse and complex structure of meaning representations.

In this thesis, we explore developing neural semantic parsing models that could better capture the structure in various types of logical formalisms and knowledge schemas, while providing approaches to mitigate the cost of labeled data acquisition. The dissertation consists of three parts. The first part introduces a general-purpose parsing model with built-in syntactic knowledge of the grammatical structure of meaning representations. Next, in the second part, we investigate approaches to encode structured information in domain knowledge schemas (e.g., database tables) useful to understand user-issued utterances. Specifically, we focus on grounding elements in the schema (e.g., columns like departure_city in database tables, or functions like GetFlight(from=GetCityByName(·)) in API specifications) to their corresponding NL constituents (e.g., from Pittsburgh) in utterances. Finally, in the third part, we aim to improve the data efficiency of semantic parsers via semisupervised learning, while developing machine-assisted approaches to accelerate training data acquisition. 

History

Date

2021-08-13

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Graham Neubig

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC