Home > Research > Publications & Outputs > Learning in unknown reward games
View graph of relations

Learning in unknown reward games: application to sensor networks

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Learning in unknown reward games: application to sensor networks. / Chapman, A. C.; Leslie, David. S.; Rogers, A. et al.
In: The Computer Journal, Vol. 57, No. 6, 01.06.2014, p. 875-892.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Chapman, AC, Leslie, DS, Rogers, A & Jennings, NR 2014, 'Learning in unknown reward games: application to sensor networks', The Computer Journal, vol. 57, no. 6, pp. 875-892. https://doi.org/10.1093/comjnl/bxt082

APA

Chapman, A. C., Leslie, D. S., Rogers, A., & Jennings, N. R. (2014). Learning in unknown reward games: application to sensor networks. The Computer Journal, 57(6), 875-892. https://doi.org/10.1093/comjnl/bxt082

Vancouver

Chapman AC, Leslie DS, Rogers A, Jennings NR. Learning in unknown reward games: application to sensor networks. The Computer Journal. 2014 Jun 1;57(6):875-892. doi: 10.1093/comjnl/bxt082

Author

Chapman, A. C. ; Leslie, David. S. ; Rogers, A. et al. / Learning in unknown reward games : application to sensor networks. In: The Computer Journal. 2014 ; Vol. 57, No. 6. pp. 875-892.

Bibtex

@article{878da57dcd114469a0bb006da2277f5f,
title = "Learning in unknown reward games: application to sensor networks",
abstract = "This paper demonstrates a decentralized method for optimization using game-theoretic multi-agent techniques, applied to a sensor network management problem. Our first major contribution is to show how the marginal contribution utility design is used to construct an unknown-reward potential game formulation of the problem. This formulation exploits the sparse structure of sensor network problems, and allows us to apply a bound to the price of anarchy of the Nash equilibria of the induced game. Furthermore, since the game is a potential game, solutions can be found using multi-agent learning techniques. The techniques we derive use Q-learning to estimate an agent's rewards, while an action adaptation process responds to an agent's opponents{\textquoteright} behaviour. However, there are many different algorithmic configurations that could be used to solve these games. Thus, our second major contribution is an extensive evaluation of several action adaptation processes. Specifically, we compare six algorithms across a variety of parameter settings to ascertain the quality of the solutions they produce, their speed of convergence and their robustness to pre-specified parameter choices. Our results show that they each perform similarly across a wide range of parameters. There is, however, a significant effect from moving to a learning policy with sampling probabilities that go to zero too quickly for rewards to be accurately estimated. ",
keywords = "potential games, learning in games, distributed optimization",
author = "Chapman, {A. C.} and Leslie, {David. S.} and A. Rogers and Jennings, {N. R.}",
year = "2014",
month = jun,
day = "1",
doi = "10.1093/comjnl/bxt082",
language = "English",
volume = "57",
pages = "875--892",
journal = "The Computer Journal",
issn = "0010-4620",
publisher = "Oxford University Press",
number = "6",

}

RIS

TY - JOUR

T1 - Learning in unknown reward games

T2 - application to sensor networks

AU - Chapman, A. C.

AU - Leslie, David. S.

AU - Rogers, A.

AU - Jennings, N. R.

PY - 2014/6/1

Y1 - 2014/6/1

N2 - This paper demonstrates a decentralized method for optimization using game-theoretic multi-agent techniques, applied to a sensor network management problem. Our first major contribution is to show how the marginal contribution utility design is used to construct an unknown-reward potential game formulation of the problem. This formulation exploits the sparse structure of sensor network problems, and allows us to apply a bound to the price of anarchy of the Nash equilibria of the induced game. Furthermore, since the game is a potential game, solutions can be found using multi-agent learning techniques. The techniques we derive use Q-learning to estimate an agent's rewards, while an action adaptation process responds to an agent's opponents’ behaviour. However, there are many different algorithmic configurations that could be used to solve these games. Thus, our second major contribution is an extensive evaluation of several action adaptation processes. Specifically, we compare six algorithms across a variety of parameter settings to ascertain the quality of the solutions they produce, their speed of convergence and their robustness to pre-specified parameter choices. Our results show that they each perform similarly across a wide range of parameters. There is, however, a significant effect from moving to a learning policy with sampling probabilities that go to zero too quickly for rewards to be accurately estimated.

AB - This paper demonstrates a decentralized method for optimization using game-theoretic multi-agent techniques, applied to a sensor network management problem. Our first major contribution is to show how the marginal contribution utility design is used to construct an unknown-reward potential game formulation of the problem. This formulation exploits the sparse structure of sensor network problems, and allows us to apply a bound to the price of anarchy of the Nash equilibria of the induced game. Furthermore, since the game is a potential game, solutions can be found using multi-agent learning techniques. The techniques we derive use Q-learning to estimate an agent's rewards, while an action adaptation process responds to an agent's opponents’ behaviour. However, there are many different algorithmic configurations that could be used to solve these games. Thus, our second major contribution is an extensive evaluation of several action adaptation processes. Specifically, we compare six algorithms across a variety of parameter settings to ascertain the quality of the solutions they produce, their speed of convergence and their robustness to pre-specified parameter choices. Our results show that they each perform similarly across a wide range of parameters. There is, however, a significant effect from moving to a learning policy with sampling probabilities that go to zero too quickly for rewards to be accurately estimated.

KW - potential games

KW - learning in games

KW - distributed optimization

U2 - 10.1093/comjnl/bxt082

DO - 10.1093/comjnl/bxt082

M3 - Journal article

VL - 57

SP - 875

EP - 892

JO - The Computer Journal

JF - The Computer Journal

SN - 0010-4620

IS - 6

ER -