posted on 1998-02-01, 00:00authored byMaria-Florina Balcan, Avrim Blum, Anupam Gupta
Approximation algorithms for clustering points in metric spaces is a flourishing area of research,
with much research effort spent on getting a better understanding of the approximation
guarantees possible for many objective functions such as k-median, k-means, and min-sum clustering.
<p>
This quest for better approximation algorithms is further fueled by the implicit hope that these
better approximations also give us more accurate clusterings. E.g., for many problems such as
clustering proteins by function, or clustering images by subject, there is some unknown “correct”
target clustering and the implicit hope is that approximately optimizing these objective functions
will in fact produce a clustering that is close (in symmetric difference) to the truth.
</p><p>
In this paper, we show that if we make this implicit assumption explicit—that is, if we assume
that any c-approximation to the given clustering objective F is ∈-close to the target—then we
can produce clusterings that are O(∈)-close to the target, even for values c for which obtaining a
c-approximation is NP-hard. In particular, for k-median and k-means objectives, we show that
we can achieve this guarantee for any constant c > 1, and for min-sum objective we can do this
for any constant c > 2.
</p><p>
Our results also highlight a somewhat surprising conceptual difference between assuming that
the optimal solution to, say, the k-median objective is ∈-close to the target, and assuming that
any approximately optimal solution is ǫ-close to the target, even for approximation factor say
c = 1.01. In the former case, the problem of finding a solution that is O(∈)-close to the target
remains computationally hard, and yet for the latter we have an efficient algorithm.</p>