posted on 2008-05-01, 00:00authored byMartin Azizyan, Aarti Singh, Larry Wasserman
<p>Semisupervised methods are techniques for using labeled data (X<sub>1</sub>; Y<sub>1</sub>),...,(X<sub>n</sub>; Y<sub>n</sub>) together with unlabeled data X<sub>n+1</sub>,...,X<sub>N</sub> to make predictions. These methods invoke some assumption that links the marginal distribution P<sub>X</sub> of X to the regression function f(x). For example, it is common to assume that f is very smooth over high density regions of P<sub>X.</sub> Many of the methods are ad-hoc and have been shown to work in specific examples but are lacking a theoretical foundation. We provide a minimax framework for analyzing semisupervised methods. In particular, we study methods based on metrics that are sensitive to the distribution P<sub>X</sub>. Our model includes a parameter α that controls the strength of the semisupervised assumption. We then use the data to adapt to α</p>