Structured Prediction for Event Detection
This chapter describes Segment-based SVMs (SegSVMs), a framework for event detection. SegSVMs combine energy-based structured prediction, maximum margin learning, and Bag-of-Words (BoWs) representation. Unlike traditional approaches for event detection based on Dynamic Bayesian Networks, the learning formulation of SegSVMs is convex, and the inference over multiple events can be efficiently done in linear time. Beyond detecting a single event, SegSVMs can be extended to solve two relatively unexplored problems in computer vision: early event detection and sequence labeling of multiple events. We illustrate the benefits of SegSVMs in several computer vision applications namely facial action unit detection, early recognition of hand gestures, early detection of facial expressions, and sequence labeling of human actions.