posted on 2006-10-14, 00:00authored byCees G.M. Snoek, Alexander Hauptmann
We focus on the problem of learning semantics from multimedia data associated with broad-
cast video documents. In this paper we propose to learn semantic concepts from multimodal
sources based on style and context detectors, in combination with statistical classifier ensembles. As a case study we present our method for detecting the concept of news subject
monologues. This approach had the best average precision performance amongst 26 sub-
missions in the 2003 video track of the Text Retrieval Conference benchmark. Experiments
were conducted with respect to individual detector contribution, ensemble size, and ranking
mechanism. It was found that the combination of detectors is decisive for the final result,
although some detectors might appear useless in isolation. Moreover, by using a probabilistic
ranking, in combination with a large classfier ensemble, results can be improved even further.
History
Publisher Statement
Appears in proceedings of AAAI 2006 Fall Symposium on Integrating Reasoning into Everyday Applications