Sparse Models of Natural Language Text
In statistical text analysis, many learning problems can be formulated as a minimization of a sum of a loss function and a regularization function for a vector of parameters (feature coefficients). The loss function drives the model to learn generalizable patterns from the training data, whereas the regularizer plays two important roles: to prevent the models from capturing idiosyncrasies of the training data (overfitting) and to encode prior knowledge about the model parameters.
When learning from high-dimensional data such as text, it has been empirically observed that relatively few dimensions are relevant to the predictive task (Forman, 2003). How can we capitalize on this insight and choose which dimensions are relevant in an informed and principled manner? Sparse regularizers provide a way to select relevant dimensions by means of regularization. However, past work rarely encodes non-trivial prior knowledge that yields sparse solutions through a regularization function. This thesis investigates the applications of sparse models—especially structured sparse models—as a medium to encode linguistically-motivated prior knowledge in textual models to advance NLP systems. We explore applications of sparse NLP models in temporal models of text, word embeddings, and text categorization.
Sparse models come with their own challenges, since new instantiations of sparse models often require a specialized optimization method. This thesis also presents opti mization methods for the proposed instantiations of sparse models. Therefore, the goals of this thesis are twofold: (i) to show how sparsity can be used to encode linguistic in formation in statistical text models, and (ii) to develop efficient learning algorithms to solve the resulting optimization problems
History
Date
2015-04-15Degree Type
- Dissertation
Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)