posted on 1993-10-01, 00:00authored byH. Brendan McMahan, Geoffrey J. Gordon, Avrim Blum
We investigate methods for planning in a
Markov Decision Process where the cost function
is chosen by an adversary after we fix
our policy. As a running example, we consider
a robot path planning problem where
costs are influenced by sensors that an adversary
places in the environment. We formulate
the problem as a zero-sum matrix game
where rows correspond to deterministic policies
for the planning player and columns correspond
to cost vectors the adversary can select.
For a fixed cost vector, fast algorithms
(such as value iteration) are available for solving
MDPs. We develop efficient algorithms
for matrix games where such best response
oracles exist. We show that for our path planning
problem these algorithms are at least an
order of magnitude faster than direct solution
of the linear programming formulation.
History
Publisher Statement
The original publication is available at www.springerlink.com