Objective Criteria for Explainable Machine Learning
As deep learning methods have obtained tremendous success over the years, our understanding of these models has yet to keep up with the development the models. Explainable machine learning is one of the main research fields dedicated to understanding complex machine learning models. While there are ever-increasing proposed instances of explanations, the evaluation of explanations has been an open question. Evaluations involving humans are expensive in the development phase of explanations. To address the difficulty to involve human-in-the-loop during the design of explanations, this thesis aims to define objective criteria which allow one to measure some goodness property explanations without humans and design explanations that are desirable with respect to the objective criteria.
In this thesis, we discuss different criteria for making evaluating explainable AI methods more objective, where our methods can mainly be categorized in three prongs: (a) faithfulness-oriented (b) theoretically-motivated (c) application-driven. A faithfulness-oriented metric is usually connected to the core concept that an explanation of the model should faithfully “explain” the model. Theoretically motivated objective criteria usually have the form “when the model and data satisfy a certain property, the explanation should satisfy a corresponding property”. Applicationdriven objective criteria simulate quantitatively how explanations can help in certain applications without humans. We design objective criteria for different types of explanations and use these objective criteria to guide the design of new explanations. Finally, some human studies are done to verify the design of these new explanations.
- Machine Learning
- Doctor of Philosophy (PhD)