posted on 2005-01-01, 00:00authored byJosef Sivic, Bryan C. Russell, Alexei A Efros, Andrew Zisserman, William T. Freeman
We seek to discover the object categories depicted in a set of
unlabelled images. We achieve this using a model developed
in the statistical text literature: probabilistic Latent Semantic
Analysis (pLSA). In text analysis this is used to discover
topics in a corpus using the bag-of-words document representation.
Here we treat object categories as topics, so that
an image containing instances of several categories is modeled
as a mixture of topics.
The model is applied to images by using a visual analogue
of a word, formed by vector quantizing SIFT-like region
descriptors. The topic discovery approach successfully
translates to the visual domain: for a small set of objects,
we show that both the object categories and their approximate
spatial layout are found without supervision. Performance
of this unsupervised method is compared to the
supervised approach of Fergus et al. [8] on a set of unseen
images containing only one object per image.
We also extend the bag-of-words vocabulary to include
‘doublets’ which encode spatially local co-occurring regions.
It is demonstrated that this extended vocabulary
gives a cleaner image segmentation. Finally, the classification
and segmentation methods are applied to a set of
images containing multiple objects per image. These results
demonstrate that we can successfully build object class
models from an unsupervised analysis of images.