Home > Research > Publications & Outputs > Bandit learning in concave N-player games

Associated organisational unit

Electronic data

  • BravoLeslieMertikopoulosNIPS2018.pdf

    Accepted author manuscript, 307 KB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

View graph of relations

Bandit learning in concave N-player games

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paper

Published

Standard

Bandit learning in concave N-player games. / Bravo, Mario; Leslie, David Stuart; Mertikopoulos, Panayotis.

32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.. ed. / S. Bengio; H. Wallach; H. Larochelle; K. Grauman; N. Cesa-Bianchi; R. Garnett. 2018. p. 5661-5671.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paper

Harvard

Bravo, M, Leslie, DS & Mertikopoulos, P 2018, Bandit learning in concave N-player games. in S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi & R Garnett (eds), 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.. pp. 5661-5671.

APA

Bravo, M., Leslie, D. S., & Mertikopoulos, P. (2018). Bandit learning in concave N-player games. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. (pp. 5661-5671)

Vancouver

Bravo M, Leslie DS, Mertikopoulos P. Bandit learning in concave N-player games. In Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, editors, 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.. 2018. p. 5661-5671

Author

Bravo, Mario ; Leslie, David Stuart ; Mertikopoulos, Panayotis. / Bandit learning in concave N-player games. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.. editor / S. Bengio ; H. Wallach ; H. Larochelle ; K. Grauman ; N. Cesa-Bianchi ; R. Garnett. 2018. pp. 5661-5671

Bibtex

@inproceedings{855b74bce7654edcaf66804c812947e4,
title = "Bandit learning in concave N-player games",
abstract = "This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.",
author = "Mario Bravo and Leslie, {David Stuart} and Panayotis Mertikopoulos",
year = "2018",
month = "12",
day = "2",
language = "English",
pages = "5661--5671",
editor = "S. Bengio and Wallach, {H. } and H. Larochelle and K. Grauman and N. Cesa-Bianchi and Garnett, {R. }",
booktitle = "32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr{\'e}al, Canada.",

}

RIS

TY - GEN

T1 - Bandit learning in concave N-player games

AU - Bravo, Mario

AU - Leslie, David Stuart

AU - Mertikopoulos, Panayotis

PY - 2018/12/2

Y1 - 2018/12/2

N2 - This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.

AB - This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.

M3 - Conference contribution/Paper

SP - 5661

EP - 5671

BT - 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.

A2 - Bengio, S.

A2 - Wallach, H.

A2 - Larochelle, H.

A2 - Grauman, K.

A2 - Cesa-Bianchi, N.

A2 - Garnett, R.

ER -