10.1184/R1/7583870.v1
Jerzy Wieczorek
Jerzy
Wieczorek
Model Selection and Stopping Rules for High-Dimensional Forward Selection
Carnegie Mellon University
2018
High-Dimensional Forward Selection
2018-04-01 00:00:00
Thesis
https://kilthub.cmu.edu/articles/thesis/Model_Selection_and_Stopping_Rules_for_High-Dimensional_Forward_Selection/7583870
Forward Selection (FS) is a popular variable selection method for linear regression. Working in a sparse high-dimensional setting, we derive sufficient conditions for FS to attain model-selection consistency, assuming the true model size is known. Compared with earlier results for the closely-related Orthogonal Matching Pursuit (OMP), our conditions are similar but obtained using a different argument. We also demonstrate why a submodularity-based argument is not fruitful for the purpose of correct model recovery.<br>Since the true model size is rarely known in practice, we also derive sufficient conditions for model-selection consistency of FS with a data-driven stopping rule, based on a sequential variant of cross-validation (CV). As a by-product of our proofs, we also have a sharp (sufficient and almost necessary) condition for model selection consistency when using "wrapper" forward search for linear regression. This appears to be the first consistency result for any wrapper model-selection method. We illustrate intuition and demonstrate performance of our methods using simulation studies and real datasets.<br><br>