Home > Research > Publications & Outputs > Population-level reinforcement learning resulti...
View graph of relations

Population-level reinforcement learning resulting in smooth best response dynamics

Research output: Working paper

Published

Standard

Harvard

APA

Vancouver

Author

Bibtex

@techreport{09f9ce9d46d3473884428b224b3c7d1d,
title = "Population-level reinforcement learning resulting in smooth best response dynamics",
abstract = "Recent models of learning in games have attempted to produce individual-levellearning algorithms that are asymptotically characterised by the replicator dynamics of evolutionary game theory. In contrast, we describe a population-level model which is characterised by the smooth best response dynamics, a system which is intrinsic to the theory of adaptive behaviour in individuals. This model is novel in that the population members are not required to make any game-theoretical calculations, and instead simply assess the values of actions based upon observed rewards. We prove that this process must converge to Nash distribution in several classes of games, including zero-sum games, games with an interior ESS, partnership games and supermodular games. A numerical example confirms the value of our approach for the Rock–Scissors–Paper game.",
author = "Leslie, {David S.} and Collins, {E. J.}",
year = "2002",
language = "English",
pages = "1--13",
type = "WorkingPaper",

}

RIS

TY - UNPB

T1 - Population-level reinforcement learning resulting in smooth best response dynamics

AU - Leslie, David S.

AU - Collins, E. J.

PY - 2002

Y1 - 2002

N2 - Recent models of learning in games have attempted to produce individual-levellearning algorithms that are asymptotically characterised by the replicator dynamics of evolutionary game theory. In contrast, we describe a population-level model which is characterised by the smooth best response dynamics, a system which is intrinsic to the theory of adaptive behaviour in individuals. This model is novel in that the population members are not required to make any game-theoretical calculations, and instead simply assess the values of actions based upon observed rewards. We prove that this process must converge to Nash distribution in several classes of games, including zero-sum games, games with an interior ESS, partnership games and supermodular games. A numerical example confirms the value of our approach for the Rock–Scissors–Paper game.

AB - Recent models of learning in games have attempted to produce individual-levellearning algorithms that are asymptotically characterised by the replicator dynamics of evolutionary game theory. In contrast, we describe a population-level model which is characterised by the smooth best response dynamics, a system which is intrinsic to the theory of adaptive behaviour in individuals. This model is novel in that the population members are not required to make any game-theoretical calculations, and instead simply assess the values of actions based upon observed rewards. We prove that this process must converge to Nash distribution in several classes of games, including zero-sum games, games with an interior ESS, partnership games and supermodular games. A numerical example confirms the value of our approach for the Rock–Scissors–Paper game.

M3 - Working paper

SP - 1

EP - 13

BT - Population-level reinforcement learning resulting in smooth best response dynamics

ER -