posted on 2000-05-01, 00:00authored byJohn Lafferty, Larry Wasserman
Semi-supervised methods use unlabeled data in addition to labeled data to construct
predictors. While existing semi-supervised methods have shown some
promising empirical performance, their development has been based largely based
on heuristics. In this paper we study semi-supervised learning from the viewpoint
of minimax theory. Our first result shows that some common methods based on
regularization using graph Laplacians do not lead to faster minimax rates of convergence.
Thus, the estimators that use the unlabeled data do not have smaller
risk than the estimators that use only labeled data. We then develop several new
approaches that provably lead to improved performance. The statistical tools of
minimax analysis are thus used to offer some new perspective on the problem of
semi-supervised learning.