Carnegie Mellon University
Browse

Stability Yields a PTAS for k-Median and k-Means Clustering

Download (201.21 kB)
journal contribution
posted on 1984-01-01, 00:00 authored by Pranjal Awasthi, Avrim Blum, Or Sheffet

We consider fc-median clustering in finite metric spaces and fc-means clustering in Euclidean spaces, in the setting where k is part of the input (not a constant). For the fc-means problem, Ostrovsky et al. show that if the optimal (k - 1)-means clustering of the input is more expensive than the optimal fc-means clustering by a factor of 1/∈2, then one can achieve a (1 + f(∈))-approximation to the fc-means optimal in time polynomial in n and k by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the (k - 1)-means optimal is more expensive than the fc-means optimal by a factor 1 + α for some constant α > 0, we can obtain a PTAS. In particular, under this assumption, for any ∈ > 0 we achieve a (1 + ∈)-approximation to the fc-means optimal in time polynomial in n and k, and exponential in 1/e and 1/α. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the fc-median problem in finite metrics under the analogous assumption as well. For fc-means, we in addition give a randomized algorithm with improved running time of no(1) (k log n)poly(1/∈,1/α) Our technique also obtains a PTAS under the assumption of Balcan et al. that all (1 + α) approximations are δ-close to a desired target clustering, in the case that all target clusters have size greater than δn and α > 0 is constant. Note that the motivation of Balcan et al. is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for fc-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(δ) to δ when all target clusters are large, and for fc-median we improve- - the "largeness" condition needed in to get exactly δ-close from O(δn) to δn. Our results are based on a new notion of clustering stability.

History

Publisher Statement

All Rights Reserved

Date

1984-01-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC