Robustifying NLP with Humans in the Loop
Despite machine learning (ML)’s many practical breakthroughs, formidable ob-stacles obstruct its deployment in consequential applications. Modern ML modelshave repeatedly been shown to rely on spurious signals, such as surface level texturesin images, and to be sensitive to background scenery, even when the task addressesthe recognition of foreground objects. In NLP, these issues have emerged as centralconcerns in the literature onannotation artifactsandbias. Moreover, while modernML performs remarkably well on independent and identically distributed (iid) hold-out data, performance often decays catastrophically under both naturally occurringand adversarial distribution shift. We desire decisions to be based on qualifications,not on distant proxies that are spuriously associated with the outcome of interest. Ar-guably one key distinction of an actual qualification might be that it actually exertscausal influence on the outcome of interest. In this thesis, we make progress towardsthese goals: in the first part, we scrutinize benchmarks and problem formulation forpopular NLP tasks, such as question answering and how models may ignore crucialparts of the input altogether and yet perform well on a held out test set; in the secondpart, we focus on introducing methods and datasets to train models to be less relianton spurious correlations by learning from several forms of human feedback (soughtvia crowdsourcing); in part three we focus on the human workforce as we discussthe ethical tensions posed by the diverse roles played by crowdworkers in NLP re-search, and discuss the implications of selecting a diverse cohort of crowdworkerson resulting human-in-the-loop feedback.
DepartmentLanguage Technologies Institute
- Doctor of Philosophy (PhD)