posted on 2007-01-01, 00:00authored byYan Ke, Rahul Sukthankar, Martial Hebert
Real-world actions occur often in crowded, dynamic environments.
This poses a difficult challenge for current
approaches to video event detection because it is difficult
to segment the actor from the background due to distracting
motion from other objects in the scene. We propose a
technique for event recognition in crowded videos that reliably
identifies actions in the presence of partial occlusion
and background clutter. Our approach is based on three
key ideas: (1) we efficiently match the volumetric representation
of an event against oversegmented spatio-temporal
video volumes; (2) we augment our shape-based features
using flow; (3) rather than treating an event template as an
atomic entity, we separately match by parts (both in space
and time), enabling robustness against occlusions and actor
variability. Our experiments on human actions, such
as picking up a dropped object or waving in a crowd show
reliable detection with few false positives.