Efficient Lifelong Learning in Deep Neural Networks: Optimizing Architecture, Training, and Data

Mehta, Sanket Vaibhav

doi:10.1184/R1/24992883.v1

Efficient Lifelong Learning in Deep Neural Networks: Optimizing Architecture, Training, and Data

thesis

posted on 2024-02-23, 14:53 authored by Sanket Vaibhav MehtaSanket Vaibhav Mehta

The prevalent machine learning paradigm involves training a separate model for every new task given a static dataset. In contrast, humans accumulate knowledge over time, and the lifelong learning paradigm seeks to emulate this process by enabling systems to learn continuously from a stream of tasks, retaining past knowledge for efficient future learning. This paradigm also offers advantages such as avoiding periodic model training, potentially reducing computational and energy requirements, and promoting environmentally friendly Green AI. In modern machine learning, deep neural networks, while powerful, face challenges like catastrophic forgetting (losing knowledge from previous tasks during new task learning) and negative interference (previously learned knowledge hindering new task learning). These issues arise from the stability-plasticity dilemma, which necessitates finding the right balance between preserving past knowledge (stability) and acquiring new knowledge (plasticity). Efficient lifelong learning systems must address this dilemma, along with other considerations like supporting online data streams, utilizing small and fixed memory buffer capacity (if any), and learning from unlabeled data streams.

In this thesis, we derive inspiration from the biological learning process and recent progress in deep learning to enable efficient lifelong learning systems. We propose injecting inductive biases into the three main components of data-driven machine learning: model (architecture & initialization), training (objective & optimization), and data. This thesis is structured into three parts, each corresponding to one of these components. In the first part, we explore the role of pre-trained initializations, revealing their implicit alleviation of forgetting compared to random ones. Next, we design a parameter-efficient expert architecture that dynamically expands learning capacity to address the stability-plasticity dilemma. In the second part, we demonstrate that explicit optimization for flat minima improves network stability and introduce a meta-learning objective for stability-plasticity balance. The third part delves into lifelong semi-supervised learning, addressing the stability-plasticity dilemma by rehearsing pseudo-labeled data. We conclude by examining pre-training from the perspective of lifelong learning, showcasing enhancements by applying the above-developed strategies to the (continual) pre-training of models

History

Date

2023-11-30

Degree Type

Dissertation

Department

Language Technologies Institute

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Emma Strubell

Usage metrics

Keywords

Lifelong Learning Continual Learning Meta Learning Catastrophic Forgetting Negative Interference Forward Transfer Pre-training Flat Minima Sharpness-Aware Minimization Mixture of Experts Natural Language Processing

Licence

CC BY 4.0

Efficient Lifelong Learning in Deep Neural Networks: Optimizing Architecture, Training, and Data

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports