Accepted author manuscript, 373 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Licence: CC BY: Creative Commons Attribution 4.0 International License
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Best-response Dynamics in Zero-sum Stochastic Games
AU - Leslie, David
AU - Perkins, Steven
AU - Xu, Zibo
PY - 2020/9/1
Y1 - 2020/9/1
N2 - We define and analyse three learning dynamics for two-player zero-sum discounted-payoff stochastic games. A continuous-time best-response dynamic in mixed strategies is proved to converge to the set of Nash equilibrium stationary strategies. Extending this, we introduce a fictitious-play-like process in a continuous-time embedding of a stochastic zero-sum game, which is again shown to converge to the set of Nash equilibrium strategies. Finally, we present a modified δ-converging best-response dynamic, in which the discount rate converges to 1, and the learned value converges to the asymptotic value of the zero-sum stochastic game. The critical feature of all the dynamic processes is a separation of adaption rates: beliefs about the value of states adapt more slowly than the strategies adapt, and in the case of the δ-converging dynamic the discount rate adapts more slowly than everything else.
AB - We define and analyse three learning dynamics for two-player zero-sum discounted-payoff stochastic games. A continuous-time best-response dynamic in mixed strategies is proved to converge to the set of Nash equilibrium stationary strategies. Extending this, we introduce a fictitious-play-like process in a continuous-time embedding of a stochastic zero-sum game, which is again shown to converge to the set of Nash equilibrium strategies. Finally, we present a modified δ-converging best-response dynamic, in which the discount rate converges to 1, and the learned value converges to the asymptotic value of the zero-sum stochastic game. The critical feature of all the dynamic processes is a separation of adaption rates: beliefs about the value of states adapt more slowly than the strategies adapt, and in the case of the δ-converging dynamic the discount rate adapts more slowly than everything else.
KW - Stochastic games
KW - Best-response dynamics
KW - Zero-sum games
KW - CONVERGENCE
U2 - 10.1016/j.jet.2020.105095
DO - 10.1016/j.jet.2020.105095
M3 - Journal article
VL - 189
JO - Journal of Economic Theory
JF - Journal of Economic Theory
SN - 0022-0531
M1 - 105095
ER -