Mitigating Negative Transfer for Better Generalization and Efficiency in Transfer Learning
The traditional machine learning paradigm of training a task-specific model on one single task has led to state-of-the-art performance in many fields (e.g. computer vision and natural language processing). To enable wider applicability of machine learning models, transfer learning aims to adapt knowledge learned from source task(s) to improve performance in other target task(s). However, existing transfer learning paradigm is still understudied, such that we have limited knowledge of its potential limitations, underlying mechanism and solutions for more intelligent transfer. In particular, when transferring knowledge from a less related source, it may inversely hurt the target performance, a phenomenon known as negative transfer. Nonetheless, the cause of negative transfer is ill-defined, and it is not clear how negative transfer affect models’ generalization and sample-efficiency.
In this thesis, with the goal of thoroughly characterizing and addressing negative transfer in machine learning models, we carefully study negative transfer in popular vision and NLP setups, glean insights on its causes, and propose solutions that lead to improved generalization and sample-efficiency. This thesis consists of three parts. The first part conducts systematic analysis of negative transfer in stateof-the-art transfer learning models. We formally characterize its conditions in both domain adaptation and multilingual NLP models, and demonstrate the task conflict as a key factor of negative transfer. In the second part, we propose various alignment methods to enhance the generalization of transferable models by resolving the aforementioned task conflicts with better-aligned representations and gradients. Finally, in the third part, we explore sample-efficient transfer learning algorithms that mitigate negative transfer using less training and/or alignment data. The contributions of this thesis include new insights on addressing negative transfer in transfer learning and a series of practical methods and algorithms that improve model generalization and efficiency.
History
Date
2021-11-23Degree Type
- Dissertation
Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)