Carnegie Mellon University
Browse

OnResource Efficient Transfer Learning via End Task Aware Training

Download (5 MB)
thesis
posted on 2024-09-16, 19:35 authored by Lucio Mwinmaarong Dery

Transfer learning is a machine learning (ML) paradigm where performance on a  desired end task is improved by exploiting ”knowledge” from other tasks. The technique has become a critical workhorse driving many of the advances on the envelope  of capabilities of machine learning models. The current formula is relatively simple– train a large model on large amounts of data from the transfer task(s); then apply  the learned model either zero-shot or adapted to the desired downstream task(s).  

This thesis recognizes that these powerful models are not developed in-vacuo but rather require non-trivial resources to train and deploy. As such, there are a wide  range of salient problems and communities of researchers that the status-quo leaves  behind. In the first part of this thesis, we will focus on the training time problem of data-efficient transfer learning. We will begin by making a case for exploiting  advanced knowledge of the desired downstream task(s)– which is commonly the  case in many ML settings– to inform different dimensions of transfer learning. We  dub this end task aware transfer learning. Next, we will present a set of novel end  task aware optimization algorithms that bias the learning trajectory towards data efficient solutions with strong generalization on the end task. We will close this part  byproviding an automated approach to constructing and searching over task-relevant  transfer objectives when only end task data is available and in limited amounts.  

For the second section of this thesis, we will develop algorithms for compute  and memory efficient transfer learning. Our goal will be to deliver a small and  efficient yet performant task specific model for deployment seeded from a large,  generalist model that has already been pre-trained on a transfer task (or set of tasks).  Focusing on structured pruning as the technique for making models smaller, we  will investigate pruning under two resource constrained settings: (1) limited task  data, where we will exploit extra transfer tasks to learn pruning structures that, at  the same task performance, lead to more compute and memory efficient models (2)  settings of limited memory, where many of the classical pruning techniques break  down because they require gradient-based optimization which can have prohibitive  memory overhead.  

This thesis concludes by presenting more avenues for future work on resource efficient transfer learning by building on our past work and suggesting novel branches  of investigation. 

History

Date

2024-07-18

Degree Type

  • Dissertation

Department

  • Computer Science

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Graham Neubig Ameet Talwalkar