Facets of regularization in high-dimensional learning: Cross-validation, risk monotonization, and model complexity
This thesis studies aspects of regularization in a high-dimensional regime in which the feature size grows proportionally with the sample size. Several commonly used prediction procedures, such as ridge and lasso, exhibit peculiar risk behavior in this regime: no explicit regularization can be optimal for (random-X) test error, the risk can be non-monotonic in the sample size, and the risk curve can exhibit double or multiple descents in the feature size, treated as a complexity measure. In this thesis, we present results on cross-validation, risk monotonization, and model complexity along these angles.
Cross-validation. We show strong uniform consistency of generalized and leave-one-out cross-validation (GCV and LOOCV) for estimating the squared test error of ridge regression. Consequently, we show that ridge tuning via GCV or LOOCV almost surely delivers the optimal regularization, be it positive, negative, or zero. Furthermore, by suitably extending GCV and LOOCV, we construct consistent estimators of the entire test error distribution and a broad class of its linear and nonlinear functionals. Our results require only minimal moment assumptions on the data distribution and are model-agnostic.
Risk monotonization. We develop a framework that modifies any generic prediction procedure such that its risk is asymptotically monotonic in the sample size. As part of our framework, we propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting, respectively, and show that under very mild assumptions they achieve monotonic asymptotic risk behavior. Our results are applicable to a wide class of prediction procedures and loss functions, and do not assume a well-specified model. We exemplify our framework with concrete analyses of the ridgeless and lassoless procedures.
Model complexity. We revisit model complexity through the lens of model optimism and degrees of freedom. By re-interpreting degrees of freedom in the fixed-X prediction setting, we extend this concept to the random-X prediction setting. We then define a family of complexity measures, whose two extreme ends we call the emergent and intrinsic degrees of freedom of a prediction model. Through linear and nonlinear example models, we illustrate how the proposed measures may prove useful to align the subtle multiple descents behavior with the typical single descent behavior observed in classical statistical prediction.
- Statistics and Data Science
- Doctor of Philosophy (PhD)