Mixed-strategy learning with continuous action sets

Home > Research > Publications & Outputs > Mixed-strategy learning with continuous action ...

Associated organisational units

Electronic data

PerkinsMertikopoulosLeslie15
Rights statement: ©2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE."
Accepted author manuscript, 277 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
07364177
Rights statement: ©2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Final published version, 323 KB, PDF document
Available under license: CC BY

Text available via DOI:

https://doi.org/10.1109/TAC.2015.2511930
Final published version
Available under license: CC BY

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Steven Perkins
Panayotis Mertikopoulos
David Stuart Leslie

More...

<mark>Journal publication date</mark>	01/2017
<mark>Journal</mark>	IEEE Transactions on Automatic Control
Issue number	1
Volume	62
Number of pages	6
Pages (from-to)	379-384
Publication Status	Published
Early online date	23/12/15
<mark>Original language</mark>	English

Abstract

Motivated by the recent applications of game-theoretical learning to the design of distributed control systems, we study a class of control problems that can be formulated as potential games with continuous action sets. We propose an actor-critic reinforcement learning algorithm that adapts mixed strategies over continuous action spaces. To analyse the algorithm we extend the theory of finite-dimensional two-timescale stochastic approximation to a Banach space setting, and prove that the continuous dynamics of the process converge to equilibrium in the case of potential games. These results combine to give a provablyconvergent learning algorithm in which players do not need to keep track of the controls selected by other agents.

Bibliographic note

©2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Research

Associated organisational units

Electronic data

Links

Text available via DOI: