QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations

Kar, Soummya; Moura, José M. F.; Poor, H. Vincent

doi:10.1184/R1/6469190.v1

file.pdf (300.57 kB)

QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations

journal contribution

posted on 2012-04-01, 00:00 authored by Soummya KarSoummya Kar, José M. F. Moura, H. Vincent Poor

The paper considers a class of multi-agent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. The paper investigates a distributed reinforcement learning setup with no prior information on the global state transition and local agent cost statistics. Specifically, with the agents’ objective consisting of minimizing a network-averaged infinite horizon discounted cost, the paper proposes a distributed version of Q-learning, QD-learning, in which the network agents collaborate by means of local processing and mutual information exchange over a sparse (possibly stochastic) communication network to achieve the network goal. Under the assumption that each agent is only aware of its local online cost data and the inter-agent communication network is weakly connected, the proposed distributed scheme is almost surely (a.s.) shown to yield asymptotically the desired value function and the optimal stationary control policy at each network agent. The analytical techniques developed in the paper to address the mixed time-scale stochastic dynamics of the consensus + innovations form, which arise as a result of the proposed interactive distributed scheme, are of independent interest.

History

Publisher Statement

© 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Date

2012-04-01

Usage metrics

Keywords

Multi-agent stochastic control Multi-agent learning Distributed Q-learning Distributed reinforcement learning Collaborative network processing Consensus + innovations Mixed time-scale stochastic approximation

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports