posted on 2006-01-01, 00:00authored byFernando De la Torre, Takeo Kanade
Clustering is one of the most widely used statistical
tools for data analysis. Among all
existing clustering techniques, k-means is a
very popular method because of its ease of
programming and because it accomplishes a
good trade-off between achieved performance
and computational complexity. However, kmeans
is prone to local minima problems, and
it does not scale well with high dimensional
data sets. A common approach to dealing
with high dimensional data is to cluster in the
space spanned by the principal components
(PC). In this paper, we show the benefits of
clustering in a low dimensional discriminative
space rather than in the PC space (generative).
In particular, we propose a new clustering
algorithm called Discriminative Cluster
Analysis (DCA). DCA jointly performs dimensionality
reduction and clustering. Several
toy and real examples show the benefits
of DCA versus traditional PCA+k-means
clustering. Additionally, a new matrix formulation
is suggested and connections with
related techniques such as spectral graph
methods and linear discriminant analysis are
provided.