Towards Model Understanding: A Framework to Furnish and Evaluate Interpretations

Pruthi, Danish

doi:10.1184/R1/25152776.v1

Towards Model Understanding: A Framework to Furnish and Evaluate Interpretations

thesis

posted on 2024-02-21, 21:54 authored by Danish Pruthi

While deep learning models have become increasingly accurate over the last decade, concerns about their (lack of) interpretability have taken a center stage. In response, a growing sub-field on interpretability and analysis of these models has emerged. Interpretability does not refer to a single technical challenge, but is an umbrella term encompassing efforts to understand the learned models, and com municate that understanding to the stakeholders. In this thesis, we make progress towards these goals: in the first part, we devise methods for interpreting word rep resentations and supplementing predictions with evidence; in the second part, we focus on evaluating model explanations—a fundamental issue facing much of inter pretability research. For many natural language tasks, people can distinguish good outputs from bad (e.g., a bilingual speaker can tell apart a good translation from its worse alternatives), however, evaluating the quality of explanations is not as straight forward. Towards this end, we present a novel framework to quantify the value of explanations. Our framework draws upon argumentative theories of human reason ing that posit that (effective) explanations communicate how the decisions are made, and can help people predict how later decisions will be made. Using our framework, we quantitatively compare several explanation generation methods at scale. As an extension of our framework, we conduct an interactive crowdsourced study to test the degree to which explanations improve users’ understanding of the models.

History

Date

2021-12-17

Degree Type

Dissertation

Department

Language Technologies Institute

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Zachary C. Lipton Graham Neubig

Usage metrics

Keywords

Interpretability Explainable AI (XAI)natural language processing Natural Language Processing

Licence

CC BY 4.0

Towards Model Understanding: A Framework to Furnish and Evaluate Interpretations

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports