Source Constrained Clustering
We consider the problem of quantizing data generated from disparate sources, e.g. subjects performing actions with different styles, movies with particular genre bias, various conditions in which images of objects are taken, etc. These are scenarios where unsupervised clustering produces inadequate codebooks because algorithms like K-means tend to cluster samples based on data biases (e.g. cluster subjects), rather than cluster similar samples across sources (e.g. cluster actions). We propose a new quantization technique, Source Constrained Clustering (SCC), which extends the K-means algorithm by enforcing clusters to group samples from multiple sources. We evaluate the method in the context of activity recognition from videos in an unconstrained environment. Experiments on several tasks and features show that using source information improves classification performance.