How many highlevel concepts will fill the semantic gap in video retrieval?
journal contributionposted on 01.01.1989, 00:00 by Alexander Hauptmann, Rong Yan, Wei-Hao Lin
A number of researchers have been building high-level semantic concept detectors such as outdoors, face, building, etc., to help with semantic video retrieval. Using the TRECVID video collection and LSCOM truth annotations from 300 concepts, we simulate performance of video retrieval under different assumptions of concept detection accuracy. Even low detection accuracy provides good retrieval results, when sufficiently many concepts are used. Considering this extrapolation under reasonable assumptions, this paper arrives at the conclusion that "concept-based" video retrieval with fewer than 5000 concepts, detected with minimal accuracy of 10% mean average precision is likely to provide high accuracy results, comparable to text retrieval on the web, in a typical broadcast news collection. We also derive evidence that it should be feasible to find sufficiently many new, useful concepts that would be helpful for retrieval.