jwieczor_statistics_2018.pdf (987.78 kB)
0/0

Model Selection and Stopping Rules for High-Dimensional Forward Selection

Download (987.78 kB)
thesis
posted on 01.04.2018 by Jerzy Wieczorek
Forward Selection (FS) is a popular variable selection method for linear regression. Working in a sparse high-dimensional setting, we derive sufficient conditions for FS to attain model-selection consistency, assuming the true model size is known. Compared with earlier results for the closely-related Orthogonal Matching Pursuit (OMP), our conditions are similar but obtained using a different argument. We also demonstrate why a submodularity-based argument is not fruitful for the purpose of correct model recovery.
Since the true model size is rarely known in practice, we also derive sufficient conditions for model-selection consistency of FS with a data-driven stopping rule, based on a sequential variant of cross-validation (CV). As a by-product of our proofs, we also have a sharp (sufficient and almost necessary) condition for model selection consistency when using "wrapper" forward search for linear regression. This appears to be the first consistency result for any wrapper model-selection method. We illustrate intuition and demonstrate performance of our methods using simulation studies and real datasets.

History

Date

01/04/2018

Degree Type

Dissertation

Department

Statistics

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Jing Lei

Exports

Categories

Exports