Carnegie Mellon University
Browse

Encoding and Decoding Graph Representations of Natural Language

Download (1.39 MB)
thesis
posted on 2024-01-12, 21:14 authored by Sam Thomson

 When analyzing text, it is often helpful to process the text into richer structures representing its linguistic content. Labeled directed graphs are a natural and flexible representation, especially for natural language semantics. Their generality over trees, for instance, allows them to represent relational semantics while handling phenomena like coreference and coordination.

 This thesis focuses on such graphical representations of text. We develop novel algorithms and models for decoding—automatically extracting graphs from raw text; and encoding—transforming graphs into tensors so that they can be used as input to a downstream neural network. We primarily focus on semantic graphs—graphs that represent sentence meanings. Semantic graphs are not necessarily trees, so they require methods that can handle arbitrary graphs. We make three main contributions. First, we introduce neural Turbo (Neurbo) parsing and show that it is an effective approach for semantic dependency parsing, and in particular multitask semantic dependency parsing. Next, we show that the task of decoding connected semantic graphs can be formulated as a Node-and-Edge-weighted Connected Subgraph (NEWCS) problem, a novel extension of a classic, well-studied problem in network optimization called the prize-collecting Steiner tree (PCST) problem. This allows us to prove theoretical properties about decoding under connectedness constraints, and to adapt and extend existing PCST solving techniques to decoding semantic graphs. Finally, we introduce new lawful neural encoders whose constrained forms allow for efficient exact inference by dynamic programming. We introduce two models: Soft Patterns (SoPa), which encode sequences, and Kleene Graph Encoder Networks (KGEN), which encode all paths in a graph, grouped by source and destination.  

History

Date

2019-05-03

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Noah Smith

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC