Equilibrium Approaches to Modern Deep Learning
Deep learning (DL) has become one of the most successful and widely-adopted methods in modern artificial intelligence. Accompanying these successes are also increasingly complex and costly architectural designs, at the foundation of which has been a core concept: layers. This thesis challenges this fundamental role of layers, and provides an in-depth introduction to a new, layer-less paradigm of deep learning that computes the output as the fixed point of a dynamical system: deep equilibrium (DEQ) models.
First, we introduce the general formulation of deep equilibrium models. We discuss how these models express “infinite-level” neural networks, decouple forward and backward passes, yet with the cost and design complexity of one traditional layer— even in some of the most competitive settings (e.g., language modeling, semantic segmentation, etc.).
Second, we further discuss the challenges and opportunities such an equilibrium approach poses. We show that the DEQ formulation reveals numerous new properties of deep learning that were long buried by the traditional layer-stacking scheme. Exploiting them allows us to train and deploy these new and lightweight equilibrium algorithms in ways that significantly complements the existing developments in deep learning, and enables us to improve results on multiple fronts at the state-of-the-art level (e.g., optical flow estimation).
The DEQ approach has already led to a new research area on implicit deep learning in the community (e.g., a NeurIPS 2020 tutorial), on both theoretical and empirical ends. We thus conclude this thesis by discussing how future work could further leverage this equilibrium perspective to build more scalable, efficient and accurate next-generation DL algorithms, including to scientific computing, which are often characterized by solutions to complex, high-dimensional dynamical systems.
- Doctor of Philosophy (PhD)