posted on 2002-09-01, 00:00authored byMaria-Florina Balcan, Avrim Blum, Nathan Srebro
We continue the investigation of natural conditions
for a similarity function to allow learning, without
requiring the similarity function to be a valid kernel,
or referring to an implicit high-dimensional
space. We provide a new notion of a “good similarity
function” that builds upon the previous definition
of Balcan and Blum (2006) but improves
on it in two important ways. First, as with the previous
definition, any large-margin kernel is also a
good similarity function in our sense, but the translation
now results in a much milder increase in the
labeled sample complexity. Second, we prove that
for distribution-specific PAC learning, our new notion
is strictly more powerful than the traditional
notion of a large-margin kernel. In particular, we
show that for any hypothesis class C there exists
a similarity function under our definition allowing
learning with O(log |C|) labeled examples. However,
in a lower bound which may be of independent
interest, we show that for any class C of pairwise
uncorrelated functions, there is no kernelwith
margin γ ≥ 8√|C| for all f ∈ C, even if one
allows average hinge-loss as large as 0.5. Thus,
the sample complexity for learning such classes
with SVMs is Ω(|C|). This extends work of Ben-
David et al. (2003) and Forster and Simon (2006)
who give hardness results with comparablemargin
bounds, but at much lower error rates.
<p>
Our new notion of similarity relies upon L<sub>1</sub> regularized
learning, and our separation result is related
to a separation result between what is learnable
with L<sub>1</sub> vs. L<sub>2</sub> regularization.</p>