Optimality of Graphlet Screening in High Dimensional Variable Selection
Consider a linear model Y = Xβ+ σz, where X has n rows and p columns and z _ N(0; In). We assume both p and n are large, including the case of p >>n. The unknown signal vector β is assumed to be sparse in the sense that only a small fraction of its components is nonzero. The goal is to identify such nonzero coordinates (i.e.,variable selection).
We are primarily interested in the regime where signals are both rare and weak so that successful variable selection is challenging but is still possible. Researches on rare and weak signals to date have been focused on the unstructured case, where the Gram matrix G = X0X is nearly orthogonal. In this paper, G is only assumed to be sparse in the sense that each row of G has relatively few large coordinates (diagonals of G are normalized to 1). The sparsity of G naturally induces the sparsity of the so called graph of strong dependence (GOSD). The key insight is that there is an interesting interplay between the signal sparsity and graph sparsity: in a broad context, the signals decompose into many small-size components of GOSD that are disconnected to each other.
We propose Graphlet Screening (GS) for variable selection. This is a two-step Screen and Clean procedure, where in the first step, we screen subgraphs of GOSD with sequential_2-tests, and in the second step, we clean with penalized MLE. The main methodological innovation is to use GOSD to guide both the screening and cleaning processes.
For any variable selection procedure ^ β, we measure its performance with the Hamming distance between the sign vectors of ^ β and β, and assess the optimality by the convergence rate of the Hamming distance. Compared with more stringent criterion as exact support recovery or oracle property, which demand strong signals,the Hamming distance criterion is more appropriate for weak signals since it naturally allows a small fraction of errors.
We show that in a broad class of situations, Graphlet Screening achieves the optimal rate of convergence in terms of the Hamming distance. Well-known procedures suchas the L0-penalization method and the L1-penalization methods do not utilize graph structure for variable selection, so they generally do not achieve the optimal rate of convergence, even in very simple settings and even when the tuning parameters are ideally set.