posted on 2002-05-01, 00:00authored byPranjal Awasthi, Avrim Blum, Or Sheffet
Optimal clustering is a notoriously hard task. Recently, several papers have suggested a new
approach to clustering, motivated by examining natural assumptions that arise in practice, or that
are made implicitly by many standard algorithmic approaches. These assumptions concern various
measures of stability of our given clustering instance. The work of Bilu and Linial [BL10] refers
to stability with respect to perturbations in the metric space, and gives positive results for inputs
stable to perturbations of size O(n1/3) for max-cut based clustering. The work of Balcan et
al [BBG09] refers to stability with respect to approximations of the objective, and gives positive
results for inputs such that all (1 + α)-approximations to the k-median (or k-means) optimal
solution are close, as partitions of the data, to the actual desired clustering. Related assumptions
are considered by Ostrovsky et al. [ORSS06].
In this paper we extend these directions in several importantways. For the Bilu-Linial assumption,
we show that for center-based clustering objectives (such as k-median, k-means, and k-center) we
can efficiently find the optimal clustering assuming only stability to constant-magnitude perturbations
of the underlying metric. For the approximation assumption of Balcan et al., we relax their
condition to require good behavior only for “natural” (1 + α)-approximations of the objective:
specifically, to make assumptions only about approximations in which each point is assigned to
its nearest center. This relaxed assumption now allows for clusters in the optimum solution to be
quite close together. Nonetheless, we show that for the k-median objective, for any α > 0, under
the assumption that all such approximations yield the same partitioning it is still possible to obtain
an optimal solution in polynomial time. In addition, we show extensions of this result to infinite
metric spaces for any α ≥ 1.