Carnegie Mellon University
Browse

Mitigating Negative Transfer for Better Generalization and Efficiency in Transfer Learning

Download (10.32 MB)
thesis
posted on 2023-01-06, 21:22 authored by Zirui WangZirui Wang

The traditional machine learning paradigm of training a task-specific model on one single task has led to state-of-the-art performance in many fields (e.g. computer vision and natural language processing). To enable wider applicability of machine learning models, transfer learning aims to adapt knowledge learned from source task(s) to improve performance in other target task(s). However, existing transfer learning paradigm is still understudied, such that we have limited knowledge of its potential limitations, underlying mechanism and solutions for more intelligent transfer. In particular, when transferring knowledge from a less related source, it may inversely hurt the target performance, a phenomenon known as negative transfer. Nonetheless, the cause of negative transfer is ill-defined, and it is not clear how negative transfer affect models’ generalization and sample-efficiency. 

In this thesis, with the goal of thoroughly characterizing and addressing negative transfer in machine learning models, we carefully study negative transfer in popular vision and NLP setups, glean insights on its causes, and propose solutions that lead to improved generalization and sample-efficiency. This thesis consists of three parts. The first part conducts systematic analysis of negative transfer in stateof-the-art transfer learning models. We formally characterize its conditions in both domain adaptation and multilingual NLP models, and demonstrate the task conflict as a key factor of negative transfer. In the second part, we propose various alignment methods to enhance the generalization of transferable models by resolving the aforementioned task conflicts with better-aligned representations and gradients. Finally, in the third part, we explore sample-efficient transfer learning algorithms that mitigate negative transfer using less training and/or alignment data. The contributions of this thesis include new insights on addressing negative transfer in transfer learning and a series of practical methods and algorithms that improve model generalization and efficiency. 

History

Date

2021-11-23

Degree Type

  • Dissertation

Department

  • Language Technologies Institute

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Yulia Tsvetkov, Emma Strubell

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC