Towards Model Understanding: A Framework to Furnish and Evaluate Interpretations
While deep learning models have become increasingly accurate over the last decade, concerns about their (lack of) interpretability have taken a center stage. In response, a growing sub-field on interpretability and analysis of these models has emerged. Interpretability does not refer to a single technical challenge, but is an umbrella term encompassing efforts to understand the learned models, and com municate that understanding to the stakeholders. In this thesis, we make progress towards these goals: in the first part, we devise methods for interpreting word rep resentations and supplementing predictions with evidence; in the second part, we focus on evaluating model explanations—a fundamental issue facing much of inter pretability research. For many natural language tasks, people can distinguish good outputs from bad (e.g., a bilingual speaker can tell apart a good translation from its worse alternatives), however, evaluating the quality of explanations is not as straight forward. Towards this end, we present a novel framework to quantify the value of explanations. Our framework draws upon argumentative theories of human reason ing that posit that (effective) explanations communicate how the decisions are made, and can help people predict how later decisions will be made. Using our framework, we quantitatively compare several explanation generation methods at scale. As an extension of our framework, we conduct an interactive crowdsourced study to test the degree to which explanations improve users’ understanding of the models.
History
Date
2021-12-17Degree Type
- Dissertation
Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)