posted on 2005-01-01, 00:00authored byRanjith Unnikrishnan, Caroline Pantofaru, Martial Hebert
Despite significant advances in image segmentation techniques,
evaluation of these techniques thus far has been
largely subjective. Typically, the effectiveness of a new algorithm
is demonstrated only by the presentation of a few
segmented images and is otherwise left to subjective evaluation
by the reader. Little effort has been spent on the design
of perceptually correct measures to compare an automatic
segmentation of an image to a set of hand-segmented examples
of the same image. This paper demonstrates how
a modification of the Rand index, the Normalized Probabilistic
Rand (NPR) index, meets the requirements of largescale
performance evaluation of image segmentation. We
show that the measure has a clear probabilistic interpretation
as the maximum likelihood estimator of an underlying
Gibbs model, can be correctly normalized to account for
the inherent similarity in a set of ground truth images, and
can be computed efficiently for large datasets. Results are
presented on images from the publicly available Berkeley
Segmentation dataset.