posted on 2001-01-01, 00:00authored byAvrim Blum, Yishay Monsour
External regret compares the performance of an online algorithm, selecting among N actions, to
the performance of the best of those actions in hindsight. Internal regret compares the loss of an
online algorithm to the loss of a modified online algorithm, which consistently replaces one action
by another.
In this paper we give a simple generic reduction that, given an algorithm for the external regret
problem, converts it to an efficient online algorithm for the internal regret problem. We provide
methods that work both in the full information model, in which the loss of every action is observed
at each time step, and the partial information (bandit) model, where at each time step only the loss
of the selected action is observed. The importance of internal regret in game theory is due to the fact
that in a general game, if each player has sublinear internal regret, then the empirical frequencies
converge to a correlated equilibrium.
For external regret we also derive a quantitative regret bound for a very general setting of regret,
which includes an arbitrary set of modification rules (that possibly modify the online algorithm) and
an arbitrary set of time selection functions (each giving different weight to each time step). The
regret for a given time selection and modification rule is the difference between the cost of the
online algorithm and the cost of the modified online algorithm, where the costs are weighted by the
time selection function. This can be viewed as a generalization of the previously-studied sleeping
experts setting.