posted on 1998-04-01, 00:00authored byChristopher James. Langmead
Abstract: "We present a randomized algorithm for semi-supervised learning of Mahalanobis metrics over R[superscript n]. The inputs to the algorithm are a set, U, of unlabeled points in R[superscript n], a set of pairs of points, S = [(x,y)[subscript i]]; x,y [element of] U, that are known to be similar, and a set of pairs of points, D = [(x,y)[subscript i]] ; x,y [element of] U, that are known to be dissimilar. The algorithm randomly samples S, D, and m-dimensional subspaces of R[superscript n] and learns a metric for each subspace. The metric over R[superscript n] is a linear combination of the subspace metrics. The randomization addresses issues of efficiency and overfitting. Extensions of the algorithm to learning non-linear metrics via kernels, and as a pre-processing step for dimensionality reduction are discussed. The new method is demonstrated on a regression problem (structure-based chemical shift prediction) and a classification problem (predicting clinical outcomes for immunomodularity strategies for treating severe sepsis).