Functional Components as a Paradigm for Neural Model Explainability
Despite their ubiquity, trained neural models remain a challenging subject for explainability, with neural net researchers applying what might be considered esoteric and arcane knowledge and skills to understand what the models are learning and how the internal workings of the models change their learning outcomes. Understanding what these models are learning is a field of utmost importance as more and more production systems rely on neural models to provide more and more high-impact utilities.
This work lays out an interpretability methodology, built on a design philosophy for neural models that redefines the unit of analysis for these models from individual neurons to a set of interconnected functional components which we call neural path ways. These functional components, which are a consequence of the architecture, data, and training scheme, have the capacity to cut across structural boundaries. This enables a method of functionally-grounded, human-in-the-loop model un derstanding through increased transparency, encouraging a dialogue between the models and the researchers.
Over the course of this work for this thesis, we contribute to the literature in four ways: First, we provide the method for neural model interpretability at the subtask level, rigorously validating it against a suite of synthetic datasets. Second, we extend the method by providing a framework for aligning learned functional components to causal structures. This enables the comparison of the learned functions of a neural model with a theoretical causal structure allowing for rapid validation of our understanding of how a neural model is approaching a task. Third, we expand the method to compare and align functional components across models with differing architectures or training procedures. And lastly, we demonstrate the capabilities of the neural pathways approach in several domains of education technologies. This includes automatic essay feedback via rhetorical structure analysis, group formation via transactivity detection, and automated essay scoring.
This last contribution can be further specified into three facets separated by their domains and foci. First, neural pathways are employed to scaffold a neural discourse parser to more easily generalize to student writing. Next, we demonstrate that neural pathways can be used as a method for error analysis by exploring the discrepancy in performance between models trained on detecting transactivity in different domains. And lastly, we demonstrate the capability of tracking changes in problematic pathways across fine-tuning an AI writing detector.
With the broad applicability of the neural pathways approach, we are optimistic that the method can have a wide impact on the the design and development of neural models and we aim to provide a foundational work that has the capability of being extended far beyond the scope of the thesis
History
Date
2023-11-27Degree Type
- Dissertation
Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)