Best-response Dynamics in Zero-sum Stochastic Games

School Of Mathematical Sciences

Associated organisational unit

Statistical Artificial Intelligence

Electronic data

StochasticGamesLearning_JETrevision
Accepted author manuscript, 373 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Text available via DOI:

https://doi.org/10.1016/j.jet.2020.105095
Final published version
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Keywords

Stochastic games, Best-response dynamics, Zero-sum games, CONVERGENCE

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Best-response Dynamics in Zero-sum Stochastic Games. / Leslie, David; Perkins, Steven ; Xu, Zibo.
In: Journal of Economic Theory, Vol. 189, 105095, 01.09.2020.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Leslie, D, Perkins, S & Xu, Z 2020, 'Best-response Dynamics in Zero-sum Stochastic Games', Journal of Economic Theory, vol. 189, 105095. https://doi.org/10.1016/j.jet.2020.105095

APA

Leslie, D., Perkins, S., & Xu, Z. (2020). Best-response Dynamics in Zero-sum Stochastic Games. Journal of Economic Theory, 189, Article 105095. https://doi.org/10.1016/j.jet.2020.105095

Vancouver

Leslie D, Perkins S, Xu Z. Best-response Dynamics in Zero-sum Stochastic Games. Journal of Economic Theory. 2020 Sept 1;189:105095. Epub 2020 Jul 16. doi: 10.1016/j.jet.2020.105095

Author

Leslie, David ; Perkins, Steven ; Xu, Zibo. / Best-response Dynamics in Zero-sum Stochastic Games. In: Journal of Economic Theory. 2020 ; Vol. 189.

Bibtex

@article{ca0a3b35aa5442c0bc6052dae7453feb,

title = "Best-response Dynamics in Zero-sum Stochastic Games",

abstract = "We define and analyse three learning dynamics for two-player zero-sum discounted-payoff stochastic games. A continuous-time best-response dynamic in mixed strategies is proved to converge to the set of Nash equilibrium stationary strategies. Extending this, we introduce a fictitious-play-like process in a continuous-time embedding of a stochastic zero-sum game, which is again shown to converge to the set of Nash equilibrium strategies. Finally, we present a modified δ-converging best-response dynamic, in which the discount rate converges to 1, and the learned value converges to the asymptotic value of the zero-sum stochastic game. The critical feature of all the dynamic processes is a separation of adaption rates: beliefs about the value of states adapt more slowly than the strategies adapt, and in the case of the δ-converging dynamic the discount rate adapts more slowly than everything else.",

keywords = "Stochastic games, Best-response dynamics, Zero-sum games, CONVERGENCE",

author = "David Leslie and Steven Perkins and Zibo Xu",

year = "2020",

month = sep,

day = "1",

doi = "10.1016/j.jet.2020.105095",

language = "English",

volume = "189",

journal = "Journal of Economic Theory",

issn = "0022-0531",

publisher = "ELSEVIER ACADEMIC PRESS INC",

}

RIS

TY - JOUR

T1 - Best-response Dynamics in Zero-sum Stochastic Games

AU - Leslie, David

AU - Perkins, Steven

AU - Xu, Zibo

PY - 2020/9/1

Y1 - 2020/9/1

N2 - We define and analyse three learning dynamics for two-player zero-sum discounted-payoff stochastic games. A continuous-time best-response dynamic in mixed strategies is proved to converge to the set of Nash equilibrium stationary strategies. Extending this, we introduce a fictitious-play-like process in a continuous-time embedding of a stochastic zero-sum game, which is again shown to converge to the set of Nash equilibrium strategies. Finally, we present a modified δ-converging best-response dynamic, in which the discount rate converges to 1, and the learned value converges to the asymptotic value of the zero-sum stochastic game. The critical feature of all the dynamic processes is a separation of adaption rates: beliefs about the value of states adapt more slowly than the strategies adapt, and in the case of the δ-converging dynamic the discount rate adapts more slowly than everything else.

AB - We define and analyse three learning dynamics for two-player zero-sum discounted-payoff stochastic games. A continuous-time best-response dynamic in mixed strategies is proved to converge to the set of Nash equilibrium stationary strategies. Extending this, we introduce a fictitious-play-like process in a continuous-time embedding of a stochastic zero-sum game, which is again shown to converge to the set of Nash equilibrium strategies. Finally, we present a modified δ-converging best-response dynamic, in which the discount rate converges to 1, and the learned value converges to the asymptotic value of the zero-sum stochastic game. The critical feature of all the dynamic processes is a separation of adaption rates: beliefs about the value of states adapt more slowly than the strategies adapt, and in the case of the δ-converging dynamic the discount rate adapts more slowly than everything else.

KW - Stochastic games

KW - Best-response dynamics

KW - Zero-sum games

KW - CONVERGENCE

U2 - 10.1016/j.jet.2020.105095

DO - 10.1016/j.jet.2020.105095

M3 - Journal article

VL - 189

JO - Journal of Economic Theory

JF - Journal of Economic Theory

SN - 0022-0531

M1 - 105095

ER -

Research

Associated organisational unit

Electronic data

Links

Text available via DOI:

Keywords