Generalization Through Richer Supervision
Machine learning methods have achieved superhuman performance in limited do- mains, but they still fall short in their ability to generalize to new scenarios. In this thesis, we suggest that this shortcoming stems from the fact that training objectives are too easy. Models can quickly overfit to training examples without learning general principles that apply broadly. We propose addressing this issue by recasting problems to be richer and more complex, encouraging models to understand the underlying structure rather than memorizing surface statistics. We present three complementary solutions: autonomous data acquisition, better learning objectives, and careful algorithm design.
Chapter 2 develops a curiosity-based system that continually learns by searching the Internet for data that it is least knowledgeable about. Our Internet Explorer method learns to generate targeted Google queries and selectively trains on retrieved data, outperforming alternatives while requiring 32x less training time and 180x less data. Chapters 3 and 4 explore how to extract more value from existing datasets by revisiting generative classifiers, which have the harder task of modeling both inputs and labels jointly. By implementing these classifiers with modern generative architectures, we achieve significant improvements in compositional reasoning and out-of-distribution generalization. Finally, Chapter 5 advances sequence modeling algorithms by recasting masked discrete diffusion as a generalization of autoregressive models and developing efficient architectures for any-order generative modeling. This approach enables training on all possible sequence permutations, resulting in better performance on algorithmic reasoning tasks as well as substantially lower sampling latency.
Funding
MESS: Model-Building, Exploratory, Social System
Defense Advanced Research Projects Agency
Find out more...Graduate Research Fellowship Program (GRFP)
Directorate for Education & Human Resources
Find out more...Graduate Research Fellowship Program (GRFP)
Directorate for Education & Human Resources
Find out more...History
Date
2025-05-07Degree Type
- Dissertation
Thesis Department
- Machine Learning
Degree Name
- Doctor of Philosophy (PhD)