jwieczor_statistics_2018.pdf (987.78 kB)
Download fileModel Selection and Stopping Rules for High-Dimensional Forward Selection
Forward Selection (FS) is a popular variable selection method for linear regression. Working in a sparse high-dimensional setting, we derive sufficient conditions for FS to attain model-selection consistency, assuming the true model size is known. Compared with earlier results for the closely-related Orthogonal Matching Pursuit (OMP), our conditions are similar but obtained using a different argument. We also demonstrate why a submodularity-based argument is not fruitful for the purpose of correct model recovery.
Since the true model size is rarely known in practice, we also derive sufficient conditions for model-selection consistency of FS with a data-driven stopping rule, based on a sequential variant of cross-validation (CV). As a by-product of our proofs, we also have a sharp (sufficient and almost necessary) condition for model selection consistency when using "wrapper" forward search for linear regression. This appears to be the first consistency result for any wrapper model-selection method. We illustrate intuition and demonstrate performance of our methods using simulation studies and real datasets.
Since the true model size is rarely known in practice, we also derive sufficient conditions for model-selection consistency of FS with a data-driven stopping rule, based on a sequential variant of cross-validation (CV). As a by-product of our proofs, we also have a sharp (sufficient and almost necessary) condition for model selection consistency when using "wrapper" forward search for linear regression. This appears to be the first consistency result for any wrapper model-selection method. We illustrate intuition and demonstrate performance of our methods using simulation studies and real datasets.
History
Date
2018-04-01Degree Type
- Dissertation
Department
- Statistics
Degree Name
- Doctor of Philosophy (PhD)