Carnegie Mellon University
Browse

Functional Components as a Paradigm for Neural Model Explainability

Download (5.55 MB)
thesis
posted on 2024-02-21, 15:25 authored by James FiaccoJames Fiacco

 Despite their ubiquity, trained neural models remain a challenging subject  for explainability, with neural net researchers applying what might be considered  esoteric and arcane knowledge and skills to understand what the models are learning  and how the internal workings of the models change their learning outcomes.  Understanding what these models are learning is a field of utmost importance as  more and more production systems rely on neural models to provide more and more  high-impact utilities.

  This work lays out an interpretability methodology, built on a design philosophy  for neural models that redefines the unit of analysis for these models from individual  neurons to a set of interconnected functional components which we call neural path ways. These functional components, which are a consequence of the architecture,  data, and training scheme, have the capacity to cut across structural boundaries.  This enables a method of functionally-grounded, human-in-the-loop model un derstanding through increased transparency, encouraging a dialogue between the  models and the researchers.  

Over the course of this work for this thesis, we contribute to the literature in four  ways: First, we provide the method for neural model interpretability at the subtask  level, rigorously validating it against a suite of synthetic datasets. Second, we extend  the method by providing a framework for aligning learned functional components  to causal structures. This enables the comparison of the learned functions of a  neural model with a theoretical causal structure allowing for rapid validation of our  understanding of how a neural model is approaching a task. Third, we expand the  method to compare and align functional components across models with differing  architectures or training procedures. And lastly, we demonstrate the capabilities of  the neural pathways approach in several domains of education technologies. This  includes automatic essay feedback via rhetorical structure analysis, group formation  via transactivity detection, and automated essay scoring.  

This last contribution can be further specified into three facets separated by  their domains and foci. First, neural pathways are employed to scaffold a neural  discourse parser to more easily generalize to student writing. Next, we demonstrate  that neural pathways can be used as a method for error analysis by exploring the  discrepancy in performance between models trained on detecting transactivity in  different domains. And lastly, we demonstrate the capability of tracking changes in  problematic pathways across fine-tuning an AI writing detector.  

With the broad applicability of the neural pathways approach, we are optimistic  that the method can have a wide impact on the the design and development of neural  models and we aim to provide a foundational work that has the capability of being  extended far beyond the scope of the thesis 

History

Date

2023-11-27

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Carolyn Rose

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC