Neural Representation Learning in Linguistic Structured Prediction
Advances in neural network architectures and training algorithms have demonstrated the effectiveness of representation learning in natural language processing. This thesis argues for the importance of modeling discrete structure in language, even when learning continuous representations.
We propose that explicit structure representations and learned distributed representations can be efficiently combined for improved performance over (i) traditional approaches to structure and (ii) uninformed neural networks that ignore all but surface sequential structure. We demonstrate, on three distinct problems, how assumptions about structure can be integrated naturally into neural representation learners for NLP problems, without sacrificing computational efficiency.
First, we propose segmental recurrent neural networks (SRNNs) which define, given an input sequence, a joint probability distribution over segmentations of the input and labelings of the segments and show that, compared to models that do not explicitly represent segments such as BIO tagging schemes and connectionist temporal classification (CTC), SRNNs obtain substantially higher accuracies on tasks including phone recognition and handwriting recognition.
Second, we propose dynamic recurrent acyclic graphical neural networks (DRAGNN), a modular neural architecture that generalizes the encoder/decoder concept to include explicit linguistic structures. Linguistic structures guide the building process of the neural networks by following the transitions and encoding the (partial) structures constructed those transitions explicitly into the hidden layer activations. We show that our framework is significantly more accurate and efficient than sequence-to-sequence with attention for syntactic dependency parsing and yields more accurate multi-task learning for extractive summarization tasks.
Third, we propose to use discrete stochastic attention to model the alignment structures explicitly in the neural sequence-to-sequence translation model. We regularize the posterior distributions of the latent alignment decisions using the posteriors computed from models that make stronger independence assumptions but that have the same latent variables. We show that our posterior regularization scheme leads to substantially improved generalization. Since the posterior regularization objective can be generally expensive to compute, we propose several approximations based on importance sampling and find that they are either as good as or better than the exact objective in terms of held-out generalization.
The techniques proposed in this thesis automatically learn structurally informed representations of the inputs. Linguistically motivated inductive biases help learning better representations in the neural models and these representations and components can be better integrated with other end-to-end deep learning systems within and beyond NLP.
History
Date
2017-09-22Degree Type
- Dissertation
Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)