Home > Research > Publications & Outputs > Mixed-strategy learning with continuous action ...

Electronic data

  • PerkinsMertikopoulosLeslie15

    Rights statement: ©2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE."

    Accepted author manuscript, 276 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

  • 07364177

    Rights statement: ©2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

    Final published version, 323 KB, PDF document

    Available under license: CC BY

Links

Text available via DOI:

View graph of relations

Mixed-strategy learning with continuous action sets

Research output: Contribution to journalJournal article

Published
Close
<mark>Journal publication date</mark>01/2017
<mark>Journal</mark>IEEE Transactions on Automatic Control
Issue number1
Volume62
Number of pages6
Pages (from-to)379-384
Publication statusPublished
Early online date23/12/15
Original languageEnglish

Abstract

Motivated by the recent applications of game-theoretical learning to the design of distributed control systems, we study a class of control problems that can be formulated as potential games with continuous action sets. We propose an actor-critic reinforcement learning algorithm that adapts mixed strategies over continuous action spaces. To analyse the algorithm we extend the theory of finite-dimensional two-timescale stochastic approximation to a Banach space setting, and prove that the continuous dynamics of the process converge to equilibrium in the case of potential games. These results combine to give a provablyconvergent learning algorithm in which players do not need to keep track of the controls selected by other agents.

Bibliographic note

©2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.