posted on 2000-02-01, 00:00authored byGeoffrey J. Gordon
Abstract: "No-regret algorithms are a popular class of online learning rules. Unfortunately, most no-regret algorithms assume that the set Y of allowable hypotheses is small and discrete. We consider instead prediction problems where Y has internal structure: Y might be the set of strategies in a game like poker, the set of paths in a graph, or the set of configurations of a data structure like a rebalancing binary search tree; or Y might be a given convex set (the 'online convex programming' problem) or in general an arbitrary bounded set. We derive a family of no-regret learning rules, called Lagrangian Hedging algorithms, to take advantage of this structure. Our algorithms are a direct generalization of known no-regret learning rules like weighted majority and external-regret matching. In addition to proving regret bounds, we demonstrate one of our algorithms learning to play one-card poker."