posted on 1999-01-01, 00:00authored byJeff Schneider, Weng-Keen Wong, Andrew Moore, Martin Riedmiller
Many interesting problems, such as power
grids, network switches, and traffic
flow, thatare candidates for solving with reinforcementlearning (RL), also have properties that makedistributed solutions desirable. We propose an algorithm for distributed reinforcement
learning based on distributing the
representation of the value function across nodes. Each node in the system only has the ability to
sense state locally, choose actions locally, and
receive reward locally (the goal of the system
is to maximize the sum of the rewards over all
nodes and over all time). However each node
is allowed to give its neighbors the current
estimate of its value function for the states it
passes through. We present a value function
learning rule, using that information, that allows each node to learn a value function that
is an estimate of a weighted sum of future
rewards for all the nodes in the network. With
this representation, each node can choose
actions to improve the performance of the over-
all system.
We demonstrate our algorithm on the
distributed control of a simulated power grid.
We compare it against other methods
including: use of a global reward signal, nodes
that act locally with no communication, and
nodes that share rewards (but not value
function) information with each other. Our
results show that the distributed value function
algorithm outperforms the others, and we
conclude with an analysis of what problems
are best suited for distributed value functions
and the new research directions opened up by
this work.