Encoding and Decoding Graph Representations of Natural Language
When analyzing text, it is often helpful to process the text into richer structures representing its linguistic content. Labeled directed graphs are a natural and flexible representation, especially for natural language semantics. Their generality over trees, for instance, allows them to represent relational semantics while handling phenomena like coreference and coordination.
This thesis focuses on such graphical representations of text. We develop novel algorithms and models for decoding—automatically extracting graphs from raw text; and encoding—transforming graphs into tensors so that they can be used as input to a downstream neural network. We primarily focus on semantic graphs—graphs that represent sentence meanings. Semantic graphs are not necessarily trees, so they require methods that can handle arbitrary graphs. We make three main contributions. First, we introduce neural Turbo (Neurbo) parsing and show that it is an effective approach for semantic dependency parsing, and in particular multitask semantic dependency parsing. Next, we show that the task of decoding connected semantic graphs can be formulated as a Node-and-Edge-weighted Connected Subgraph (NEWCS) problem, a novel extension of a classic, well-studied problem in network optimization called the prize-collecting Steiner tree (PCST) problem. This allows us to prove theoretical properties about decoding under connectedness constraints, and to adapt and extend existing PCST solving techniques to decoding semantic graphs. Finally, we introduce new lawful neural encoders whose constrained forms allow for efficient exact inference by dynamic programming. We introduce two models: Soft Patterns (SoPa), which encode sequences, and Kleene Graph Encoder Networks (KGEN), which encode all paths in a graph, grouped by source and destination.
History
Date
2019-05-03Degree Type
- Dissertation
Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)