Fairness and Privacy Violations in Black-Box Personalization Systems: Detection and Defenses
Black box personalization systems have become ubiquitous in our daily lives. They utilize collected data about us to make critical decisions such as those related to credit approval and insurance premiums. This leads to concerns about whether these systems respect expectations of fairness and privacy. Given the black box nature of these systems, it is challenging to test whether they satisfy certain fundamental fairness and privacy properties. For the same reason, while many black box privacy enhancing technologies offer consumers the ability to defend themselves from data collection, it is unclear how effective they are. In this doctoral thesis, we demonstrate that carefully designed methods and tools that soundly and scalably discover causal effects in black box software systems are useful in evaluating personalization systems and privacy enhancing technologies to understand how well they protect fairness and privacy. As an additional defense against discrimination, this thesis also explores legal liability for ad platforms in serving discriminatory ads. To formally study fairness and privacy properties in black box personalization systems, we translate these properties into information flow instances and develop methods to detect information flow. First, we establish a formal connection between information flow and causal effects. As a consequence, we can use randomized controlled experiments, traditionally used to detect causal effects, to detect information flow through black box systems. We develop AdFisher as a general framework to perform information flow experiments scalably on web systems and use it to evaluate discrimination, transparency, and choice on Google’s advertising ecosystem. We find evidence of gender-based discrimination in employment-related ads and a lack of transparency in Google’s transparency tool when serving ads for rehabilitation centers after visits to websites about substance abuse. Given the presence of discrimination and the use of sensitive attributes in personalization systems, we explore possible defenses for consumers. First, we evaluate the effectiveness of publicly available privacy enhancing technologies in protecting consumers from data collection by online trackers. Specifically, we use a combination of experimental and observational approaches to examine how well the technologies protect consumers against fingerprinting, an advanced form of tracking. Next, we explore legal liability for an advertising platform like Google for delivering employment and housing ads in a discriminatory manner under Title VII and the Fair Housing Act respectively. We find that an ad platform is unlikely to incur liability under Title VII due to its limited coverage. However, we argue that housing ads violating the Fair Housing Act could create liability if the ad platform targets ads toward or away from protected classes without explicit instructions from the advertiser.