Carnegie Mellon University
Browse

Generalization Through Richer Supervision

Download (13.63 MB)
thesis
posted on 2025-06-24, 17:49 authored by Alexander LiAlexander Li

Machine learning methods have achieved superhuman performance in limited do- mains, but they still fall short in their ability to generalize to new scenarios. In this thesis, we suggest that this shortcoming stems from the fact that training objectives are too easy. Models can quickly overfit to training examples without learning general principles that apply broadly. We propose addressing this issue by recasting problems to be richer and more complex, encouraging models to understand the underlying structure rather than memorizing surface statistics. We present three complementary solutions: autonomous data acquisition, better learning objectives, and careful algorithm design.

Chapter 2 develops a curiosity-based system that continually learns by searching the Internet for data that it is least knowledgeable about. Our Internet Explorer method learns to generate targeted Google queries and selectively trains on retrieved data, outperforming alternatives while requiring 32x less training time and 180x less data. Chapters 3 and 4 explore how to extract more value from existing datasets by revisiting generative classifiers, which have the harder task of modeling both inputs and labels jointly. By implementing these classifiers with modern generative architectures, we achieve significant improvements in compositional reasoning and out-of-distribution generalization. Finally, Chapter 5 advances sequence modeling algorithms by recasting masked discrete diffusion as a generalization of autoregressive models and developing efficient architectures for any-order generative modeling. This approach enables training on all possible sequence permutations, resulting in better performance on algorithmic reasoning tasks as well as substantially lower sampling latency.

Funding

MESS: Model-Building, Exploratory, Social System

Defense Advanced Research Projects Agency

Find out more...

Graduate Research Fellowship Program (GRFP)

Directorate for Education & Human Resources

Find out more...

Graduate Research Fellowship Program (GRFP)

Directorate for Education & Human Resources

Find out more...

History

Date

2025-05-07

Degree Type

  • Dissertation

Thesis Department

  • Machine Learning

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Deepak Pathak