Leveraging Structure and Context for Language-Adjacent Representation Learning
When learning representations from large corpora of language data, the over whelming strategy is to interpret that data as a collection of IID samples to be modeled in isolation from one another. While this approach is in some ways beneficial as it allows for efficient training via SGD and doesn’t rely on metadata that may not always be present, it does come with limitations. Taking advantage of more com plex structural links between individual datapoints can let information flow within our corpora, making learned representations more context-sensitive, and allowing for heavier parameter sharing to more easily generalize to examples from unseen class types. In this work, we will apply this idea to a variety of settings — largely those that lie at the boundary between language and other modalities — for which much of the existing prior work has not explicitly made use of observable structure within the data, and also show how we can add useful inductive bias to our models through lower-level modeling choices. In order to retain interpretability and control we will do this both using probabilistic variational learning frameworks, and also non-variational approaches such as checklist models and retrieval guided generation.
This dissertation is organized into three parts which apply this broad theme to various specific applications. First we’ll examine the task of learning disentangled representations of style and structure in digital fonts, and then apply similar modeling ideas to the task of analyzing handwriting styles of scribal hands of Linear B. In the next part we’ll investigate ways to learn contextualized representations for temporally ordered data where predictions for one datapoint may influence nearby predictions, such as piano fingering estimation and discursive topic modeling on social media. Finally we’ll put forward new approaches for settings where the input signal is itself multimodal, such as the task of writing descriptive captions for images and music.
History
Date
2024-04-04Degree Type
- Dissertation
Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)