Carnegie Mellon University
Browse

Using Multitask Learning to Understand Language Processing in the Brain

Download (34.46 MB)
thesis
posted on 2022-12-15, 20:20 authored by Daniel SchwartzDaniel Schwartz

Understanding the cognitive processes involved in human language comprehension has been a longstanding goal in the scientific community. While significant progress towards that goal has been made, the processes involved in integrating a sequence of individual word meanings into the meaning of a clause, sentence, or discourse is poorly understood. Recently, the natural language processing (NLP) community has demonstrated that deep language models are, to an extent, capable of representing word meanings and performing integration of those meanings into a representation that can successfully capture the meaning of a sequence. In this thesis, we therefore leverage deep language models as an analysis tool to improve our understanding of human language processing. In the setting of multitask learning, we can gain insight into the mechanisms that deep language models use to make their predictions by comparing tasks to each other. Furthermore, if some of the task predictions we ask the model to make are relevant to cognitive processing — for example the prediction of eyetracking data measured as participants read sentences — we can ultimately use those insights to better understand language processing in people. In this work, we first examine the use of constructive interference in multitask learning as an analysis tool. Constructive interference occurs when two tasks are related and a model is constrained by having to accurately predict both. In those cases, the representation the model learns often generalizes better to predicting unseen data than if that model had been trained on just one of those tasks. This is because the constraint that the representations must work for the prediction of both tasks provides a helpful inductive bias. If generalization error improves when tasks are trained together, this can be viewed, with caveats, as an indication that the tasks are related. In our experiments improved generalization error suggests relationships between event-related potential components (ERPs) that are consistent with the existing literature, and suggests other relationships which bear on the interpretation of ERPs. Next, we investigate what a deep language model learns when it is trained to predict brain activity recordings. We find that the information encoded into the parameters of the model helps the model predict brain activity, generalizes to prediction of unseen participants’ brain activity, and to some degree, generalizes across different brain activity recording modalities. These findings provide evidence that the information the model encodes into its parameters is relevant to the cognitive processes underlying brain activity, and not just to idiosyncrasies in the data, making fine-tuning and multitask learning valid tools for probing those cognitive processes. Finally, we develop an analysis method in which a model learns a small number of latent functions which take a sequence of words as input and produce a representation from which multiple task outputs must be predicted. We assess task similarities based on the weights which map from the common latent representation to the output associated with each task. The similarities produced by this method capture expected relationships between NLP tasks, and can help us understand how a deep language model makes its predictions. We also examine the similarities between cognition-relevant tasks and NLP tasks, and find that the mechanisms underlying the model’s predictions in cognition-relevant tasks are related to agent-like and patient-like semantic properties and to modifiers in a sentence. The methods developed here can be applied with different sets of tasks to gain different kinds of insight into both deep language models and cognitive processing, and offer a promising direction for understanding language processing in the brain. 

History

Date

2020-08-07

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Tom Mitchell

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC