Robustifying NLP with Humans in the Loop

Kaushik, Divyansh

doi:10.1184/R1/22213705.v1

Robustifying NLP with Humans in the Loop

thesis

posted on 2023-03-03, 19:42 authored by Divyansh KaushikDivyansh Kaushik

Despite machine learning (ML)’s many practical breakthroughs, formidable ob-stacles obstruct its deployment in consequential applications. Modern ML modelshave repeatedly been shown to rely on spurious signals, such as surface level texturesin images, and to be sensitive to background scenery, even when the task addressesthe recognition of foreground objects. In NLP, these issues have emerged as centralconcerns in the literature onannotation artifactsandbias. Moreover, while modernML performs remarkably well on independent and identically distributed (iid) hold-out data, performance often decays catastrophically under both naturally occurringand adversarial distribution shift. We desire decisions to be based on qualifications,not on distant proxies that are spuriously associated with the outcome of interest. Ar-guably one key distinction of an actual qualification might be that it actually exertscausal influence on the outcome of interest. In this thesis, we make progress towardsthese goals: in the first part, we scrutinize benchmarks and problem formulation forpopular NLP tasks, such as question answering and how models may ignore crucialparts of the input altogether and yet perform well on a held out test set; in the secondpart, we focus on introducing methods and datasets to train models to be less relianton spurious correlations by learning from several forms of human feedback (soughtvia crowdsourcing); in part three we focus on the human workforce as we discussthe ethical tensions posed by the diverse roles played by crowdworkers in NLP re-search, and discuss the implications of selecting a diverse cohort of crowdworkerson resulting human-in-the-loop feedback.

History

Date

2022-11-17

Degree Type

Dissertation

Department

Language Technologies Institute

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Zachary C. Lipton Eduard Hovy

Usage metrics

Keywords

Natural Language Processing Robustness Humans in the Loop Counterfactually Augmented Data Spurious Correlations Artificial Intelligence and Image Processing Natural Language Processing

Licence

CC BY 4.0

Robustifying NLP with Humans in the Loop

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports