The Power of Localization for Efficiently Learning Linear Separators with Noise
We introduce a new approach for designing computationally efficient and noise tolerant algorithms for learning linear separators. We consider the malicious noise model of Valiant [41, 32] and the adversarial label noise model of Kearns, Schapire, and Sellie [34]. For malicious noise, where the adversary can corrupt an η of fraction both the label part and the feature part, we provide a polynomial-time algorithm for learning linear separators in Rd under the uniform distribution with nearly information-theoretically optimal noise tolerance of η = Ω(ε), improving on the Ω(&epsilon/d1/4) noise-tolerance of [31] and the Ω(ε2/log(d/ε) of [35]. For the adversarial label noisemodel, where the distribution over the feature vectors is unchanged, and the overall probability of a noisy label is constrained to be at most η, we give a polynomial-time algorithm for learning linear separators in Rd under the uniform distribution that can also handle a noise rate of η = Ω(ε). This improves over the results of [31] which either required runtime super-exponential in 1/ε (ours is polynomial in 1/ε) or tolerated less noise.
In the case that the distribution is isotropic log-concave, we present a polynomial-time algorithm for the malicious noise model that tolerates Ω(ε/log2(1/ε)) noise, and a polynomial-time algorithm for the adversarial label noise model that also handles Ω(ε/log2(1/ε)) noise. Both of these also improve on results from [35]. In particular, in the case of malicious noise, unlike previous results, our noise tolerance has no dependence on the dimension d of the space.
Our algorithms are also efficient in the active learning setting, where learning algorithms only receive the classifications of examples when they ask for them. We show that, in this model, our algorithms achieve a label complexity whose dependence on the error parameter ε is polylogarithmic (and thus exponentially better than that of any passive algorithm). This provides the first polynomial time active learning algorithm for learning linear separators in the presence of malicious noise or adversarial label noise.