Carnegie Mellon University
Browse
- No file added yet -

Towards Model Understanding: A Framework to Furnish and Evaluate Interpretations

Download (1.95 MB)
thesis
posted on 2024-02-21, 21:54 authored by Danish Pruthi

  While deep learning models have become increasingly accurate over the last  decade, concerns about their (lack of) interpretability have taken a center stage. In  response, a growing sub-field on interpretability and analysis of these models has  emerged. Interpretability does not refer to a single technical challenge, but is an  umbrella term encompassing efforts to understand the learned models, and com municate that understanding to the stakeholders. In this thesis, we make progress  towards these goals: in the first part, we devise methods for interpreting word rep resentations and supplementing predictions with evidence; in the second part, we  focus on evaluating model explanations—a fundamental issue facing much of inter pretability research. For many natural language tasks, people can distinguish good  outputs from bad (e.g., a bilingual speaker can tell apart a good translation from its  worse alternatives), however, evaluating the quality of explanations is not as straight forward. Towards this end, we present a novel framework to quantify the value of  explanations. Our framework draws upon argumentative theories of human reason ing that posit that (effective) explanations communicate how the decisions are made,  and can help people predict how later decisions will be made. Using our framework,  we quantitatively compare several explanation generation methods at scale. As an  extension of our framework, we conduct an interactive crowdsourced study to test  the degree to which explanations improve users’ understanding of the models.  

History

Date

2021-12-17

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Zachary C. Lipton Graham Neubig

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC