Learning Policies for Contextual Submodular Prediction

Ross, Stephane; Zhou, Jiaji; Yue, Yisong; Dey, Debadeepta; Bagnell, J. Andrew

doi:10.1184/R1/6555362.v1

file.pdf (567.71 kB)

Learning Policies for Contextual Submodular Prediction

journal contribution

posted on 2013-05-01, 00:00 authored by Stephane Ross, Jiaji Zhou, Yisong Yue, Debadeepta Dey, J. Andrew Bagnell

Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options. Such lists are often evaluated using submodular reward functions that measure both quality and diversity. We propose a simple, efficient, and provably near-optimal approach to optimizing such prediction problems based on no-regret learning. Our method leverages a surprising result from online submodular optimization: a single no-regret online learner can compete with an optimal sequence of predictions. Compared to previous work, which either learn a sequence of classifiers or rely on stronger assumptions such as realizability, we ensure both data-efficiency as well as performance guarantees in the fully agnostic setting. Experiments validate the efficiency and applicability of the approach on a wide range of problems including manipulator trajectory optimization, news recommendation and document summarization.

History

Publisher Statement

Date

2013-05-01

Usage metrics

Keywords

Robotics

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Learning Policies for Contextual Submodular Prediction

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports