Carnegie Mellon University
Browse

Improved Guarantees for Learning via Similarity Functions

Download (198.91 kB)
journal contribution
posted on 2002-09-01, 00:00 authored by Maria-Florina Balcan, Avrim Blum, Nathan Srebro
We continue the investigation of natural conditions for a similarity function to allow learning, without requiring the similarity function to be a valid kernel, or referring to an implicit high-dimensional space. We provide a new notion of a “good similarity function” that builds upon the previous definition of Balcan and Blum (2006) but improves on it in two important ways. First, as with the previous definition, any large-margin kernel is also a good similarity function in our sense, but the translation now results in a much milder increase in the labeled sample complexity. Second, we prove that for distribution-specific PAC learning, our new notion is strictly more powerful than the traditional notion of a large-margin kernel. In particular, we show that for any hypothesis class C there exists a similarity function under our definition allowing learning with O(log |C|) labeled examples. However, in a lower bound which may be of independent interest, we show that for any class C of pairwise uncorrelated functions, there is no kernelwith margin γ ≥ 8√|C| for all f ∈ C, even if one allows average hinge-loss as large as 0.5. Thus, the sample complexity for learning such classes with SVMs is Ω(|C|). This extends work of Ben- David et al. (2003) and Forster and Simon (2006) who give hardness results with comparablemargin bounds, but at much lower error rates. <p> Our new notion of similarity relies upon L<sub>1</sub> regularized learning, and our separation result is related to a separation result between what is learnable with L<sub>1</sub> vs. L<sub>2</sub> regularization.</p>

History

Date

2002-09-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC