Encoding and Decoding Graph Representations of  Natural Language

Thomson, Sam

doi:10.1184/R1/24943482.v1

Encoding and Decoding Graph Representations of Natural Language

thesis

posted on 2024-01-12, 21:14 authored by Sam Thomson

When analyzing text, it is often helpful to process the text into richer structures representing its linguistic content. Labeled directed graphs are a natural and flexible representation, especially for natural language semantics. Their generality over trees, for instance, allows them to represent relational semantics while handling phenomena like coreference and coordination.

This thesis focuses on such graphical representations of text. We develop novel algorithms and models for decoding—automatically extracting graphs from raw text; and encoding—transforming graphs into tensors so that they can be used as input to a downstream neural network. We primarily focus on semantic graphs—graphs that represent sentence meanings. Semantic graphs are not necessarily trees, so they require methods that can handle arbitrary graphs. We make three main contributions. First, we introduce neural Turbo (Neurbo) parsing and show that it is an effective approach for semantic dependency parsing, and in particular multitask semantic dependency parsing. Next, we show that the task of decoding connected semantic graphs can be formulated as a Node-and-Edge-weighted Connected Subgraph (NEWCS) problem, a novel extension of a classic, well-studied problem in network optimization called the prize-collecting Steiner tree (PCST) problem. This allows us to prove theoretical properties about decoding under connectedness constraints, and to adapt and extend existing PCST solving techniques to decoding semantic graphs. Finally, we introduce new lawful neural encoders whose constrained forms allow for efficient exact inference by dynamic programming. We introduce two models: Soft Patterns (SoPa), which encode sequences, and Kleene Graph Encoder Networks (KGEN), which encode all paths in a graph, grouped by source and destination.

History

Date

2019-05-03

Degree Type

Dissertation

Department

Language Technologies Institute

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Noah Smith

Usage metrics

Keywords

encoding decoding graph representations natural language Natural Language Processing

Licence

CC BY 4.0