Carnegie Mellon University
CMU-CS-21-121.pdf (2.27 MB)

Black-Box Approaches to Fair Machine Learning

Download (2.27 MB)
posted on 2021-11-09, 21:51 authored by Samuel YeomSamuel Yeom
As machine learning models are increasingly used to make consequential decisions involving humans, it is important that the models do not discriminate on the basis of protected attributes such as race and gender. However, the model holder are not the people who bear the brunt of the harm from a discriminatory model, so there is little natural incentive for the model holder to fix a discriminatory model.
It would thus be societally beneficial if other entities could also detect or mitigate unfair behavior in these models. Black-box methods, which require only query access
to the model, are well suited for this purpose, as they are feasible to carry out without knowing the full details of the model. In this thesis, I consider three different forms of unfairness and present blackbox methods that address them. The first of the three is proxy use, where some
component of a model is a proxy for a protected attribute. The second is the lack of individual fairness, which formalizes the intuitive notion that a model should not
make arbitrary decisions. Finally, a model may have an unrepresentative training set, which can lead the model to exhibit varying degrees of accuracy for different protected groups. For each of these behaviors, I propose one or more methods that can help detect such behavior in models or ensure the lack thereof. These methods require only black-box access to the model, allowing them to be effective even when the model holder is uncooperative. My theoretical and experimental analysis of these methods evidence their effectiveness in this setting, showing that they are useful
technical tools that can support an effective response against discrimination.




Degree Type

  • Dissertation


  • Computer Science

Degree Name

  • Doctor of Philosophy (PhD)


Matt Fredrikson