posted on 2005-01-01, 00:00authored byYan Ke, Rahul Sukthankar, Martial Hebert
This paper studies the use of volumetric features as an alternative
to popular local descriptor approaches for event
detection in video sequences. Motivated by the recent success
of similar ideas in object detection on static images,
we generalize the notion of 2D box features to 3D spatiotemporal
volumetric features. This general framework enables
us to do real-time video analysis. We construct a realtime
event detector for each action of interest by learning
a cascade of filters based on volumetric features that efficiently
scans video sequences in space and time. This event
detector recognizes actions that are traditionally problematic
for interest point methods — such as smooth motions
where insufficient space-time interest points are available.
Our experiments demonstrate that the technique accurately
detects actions on real-world sequences and is robust to
changes in viewpoint, scale and action speed. We also adapt
our technique to the related task of human action classification
and confirm that it achieves performance comparable
to a current interest point based human activity recognizer
on a standard database of human activities.