Carnegie Mellon University
Browse
dkaushik_phd_lti_2022.pdf (2.57 MB)

Robustifying NLP with Humans in the Loop

Download (2.57 MB)
thesis
posted on 2023-03-03, 19:42 authored by Divyansh KaushikDivyansh Kaushik

Despite machine learning (ML)’s many practical breakthroughs, formidable ob-stacles obstruct its deployment in consequential applications.  Modern ML modelshave repeatedly been shown to rely on spurious signals, such as surface level texturesin images, and to be sensitive to background scenery, even when the task addressesthe recognition of foreground objects. In NLP, these issues have emerged as centralconcerns in the literature onannotation artifactsandbias. Moreover, while modernML performs remarkably well on independent and identically distributed (iid) hold-out data, performance often decays catastrophically under both naturally occurringand adversarial distribution shift. We desire decisions to be based on qualifications,not on distant proxies that are spuriously associated with the outcome of interest. Ar-guably one key distinction of an actual qualification might be that it actually exertscausal influence on the outcome of interest. In this thesis, we make progress towardsthese goals: in the first part, we scrutinize benchmarks and problem formulation forpopular NLP tasks, such as question answering and how models may ignore crucialparts of the input altogether and yet perform well on a held out test set; in the secondpart, we focus on introducing methods and datasets to train models to be less relianton spurious correlations by learning from several forms of human feedback (soughtvia crowdsourcing); in part three we focus on the human workforce as we discussthe ethical tensions posed by the diverse roles played by crowdworkers in NLP re-search, and discuss the implications of selecting a diverse cohort of crowdworkerson resulting human-in-the-loop feedback.

History

Date

2022-11-17

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Zachary C. Lipton Eduard Hovy

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC