Adapting to Structure and Using Structure to Adapt: Toward Explaining the Success of Modern Deep Learning
This thesis studies the remarkable success of deep learning. It offers the perspective that, rather than developing black-box generalization bounds, one particularly fruitful way to understand the success of modern deep learning is through the careful interplay between neural networks’ flexibility and structure in specific domains. In these domains, we can understand modern deep learning through its ability to (1) adapt to structure in data and (2) use its structures (architecture, pretrained initialization, etc.) to adapt. We build this perspective through a mix of theory and empirics. We begin by looking at traditional learning theory tools: generalization bounds. Specifically, we study algorithmic stability as a possible framework for explaining the performance of gradient descent in overparameterized neural networks. We provide empirical evidence that uniform stability does not appear with sufficient strength to explain the generalization performance of neural networks. Then, instead of focusing on taming deep learning’s flexibility, we recast deep learning’s flexibility as a powerful ability to adapt when just enough structure is present. In the remainder of the thesis, we carefully study three key settings - convolutional neural networks on image data, simple Transformers on basic algorithmic tasks, and pretrained language models on natural language data - that demonstrate the impressive ability of neural networks to adapt to structure in data and leverage their structures to quickly and flexibly adapt. Together, these three settings trace the evolution of training methods and paradigms over the past six years. Instead of the bleaker image painted by the more black-box approach to generalization that we began with, we use these settings to advocate for a more mechanistic and nuanced understanding of the interplay between neural networks’ flexibility and structure in specific domains.
History
Date
2024-09-12Degree Type
- Dissertation
Thesis Department
- Machine Learning
Degree Name
- Doctor of Philosophy (PhD)