Explaining and Evaluating Deep Neural Networks in Natural Language Processing
Deep Neural Networks such as Recurrent Neural Networks and Transformer models are widely adopted for their superior performance in many natural language processing (NLP) applications, but they remain largely black-box which can lead to unwanted bias and unrobust behaviors. To discover, diagnose and evaluate these issues, it is essential to make NLP systems accountable, reliable and most importantly, explainable. Explainability tools and methods can be leveraged by NLP practitioners to better understand how model make certain predictions, if the models are conceptually sound, and if the predictions are justified.
In this dissertation, we propose new explainability frameworks for NLP models, and show how those frameworks can be used to understand and evaluate NLP models beyond their accuracy metrics. Inspired by prior explainability approaches, our methods are designed to be distinctively suitable for textual data and NLP model architectures.
Our proposed methods can help understand exactly how a linguistic concept is represented in a model by answering two conceptual soundness questions: (1) how does important linguistic information flow from input words to output predictions? (2) how is the relative order of words/phrases encoded? We experiment with both recurrent and Transformer architectures across a variety of NLP tasks and show how our methods precisely and faithfully explain and evaluate the inner workings and order-sensitivity of NLP models. Furthermore, we exemplify generated explanations in key NLP tasks depicting how various linguistic concepts are represented, whether those representations are compatible with the underlying grammatical and linguistic rules, and how deviation from those rules manifests into model weaknesses and conceptual unsoundness.
History
Date
2022-05-04Degree Type
- Dissertation
Department
- Electrical and Computer Engineering
Degree Name
- Doctor of Philosophy (PhD)