Research output: Contribution to journal › Journal article › peer-review

Published

**Convergent multiple-timescales reinforcement learning algorithms in normal form games.** / Leslie, David S.

Research output: Contribution to journal › Journal article › peer-review

Leslie, DS 2003, 'Convergent multiple-timescales reinforcement learning algorithms in normal form games', *Annals of Applied Probability*, vol. 13, no. 4, pp. 1231-1251. <http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoap/1069786497>

Leslie, D. S. (2003). Convergent multiple-timescales reinforcement learning algorithms in normal form games. *Annals of Applied Probability*, *13*(4), 1231-1251. http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoap/1069786497

Leslie DS. Convergent multiple-timescales reinforcement learning algorithms in normal form games. Annals of Applied Probability. 2003;13(4):1231-1251.

@article{d57877af7e294f098e32b8bbae87ea7e,

title = "Convergent multiple-timescales reinforcement learning algorithms in normal form games",

abstract = "We consider reinforcement learning algorithms in normal form games. Using two-timescales stochastic approximation, we introduce a model-free algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely to Nash distribution in two-player zero-sum games and N -player partnership games. However, there are simple games for which these, and most other adaptive processes, fail to converge--in particular, we consider the N -player matching pennies game and Shapley's variant of the rock--scissors--paper game. By extending stochastic approximation results to multiple timescales we can allow each player to learn at a different rate. We show that this extension will converge for two-player zero-sum games and two-player partnership games, as well as for the two special cases we consider.",

author = "Leslie, {David S.}",

year = "2003",

language = "English",

volume = "13",

pages = "1231--1251",

journal = "Annals of Applied Probability",

issn = "1050-5164",

publisher = "Institute of Mathematical Statistics",

number = "4",

}

TY - JOUR

T1 - Convergent multiple-timescales reinforcement learning algorithms in normal form games

AU - Leslie, David S.

PY - 2003

Y1 - 2003

N2 - We consider reinforcement learning algorithms in normal form games. Using two-timescales stochastic approximation, we introduce a model-free algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely to Nash distribution in two-player zero-sum games and N -player partnership games. However, there are simple games for which these, and most other adaptive processes, fail to converge--in particular, we consider the N -player matching pennies game and Shapley's variant of the rock--scissors--paper game. By extending stochastic approximation results to multiple timescales we can allow each player to learn at a different rate. We show that this extension will converge for two-player zero-sum games and two-player partnership games, as well as for the two special cases we consider.

AB - We consider reinforcement learning algorithms in normal form games. Using two-timescales stochastic approximation, we introduce a model-free algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely to Nash distribution in two-player zero-sum games and N -player partnership games. However, there are simple games for which these, and most other adaptive processes, fail to converge--in particular, we consider the N -player matching pennies game and Shapley's variant of the rock--scissors--paper game. By extending stochastic approximation results to multiple timescales we can allow each player to learn at a different rate. We show that this extension will converge for two-player zero-sum games and two-player partnership games, as well as for the two special cases we consider.

M3 - Journal article

VL - 13

SP - 1231

EP - 1251

JO - Annals of Applied Probability

JF - Annals of Applied Probability

SN - 1050-5164

IS - 4

ER -