Approximations of the Restless Bandit Problem

School Of Mathematical Sciences

Associated organisational units

Electronic data

17-547
Final published version, 393 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Approximations of the Restless Bandit Problem. / Grunewalder, Steffen ; Khaleghi, Azadeh.
In: Journal of Machine Learning Research, Vol. 20, No. 14, 01.02.2019, p. 1-37.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Author

Grunewalder, Steffen ; Khaleghi, Azadeh. / Approximations of the Restless Bandit Problem. In: Journal of Machine Learning Research. 2019 ; Vol. 20, No. 14. pp. 1-37.

Bibtex

@article{a003c53d75024ee58e9e32131f4cd171,

title = "Approximations of the Restless Bandit Problem",

abstract = "The multi-armed restless bandit problem is studied in the case where the pay-off distributions are stationary φ-mixing. This version of the problem provides a more realistic model for most real-world applications, but cannot be optimally solved in practice, since it is known to be PSPACE-hard. The objective of this paper is to characterize a sub-class of the problem where good approximate solutions can be found using tractable approaches. Specifically, it is shown that under some conditions on the φ-mixing coefficients, a modified version of UCB can prove effective. The main challenge is that, unlike in the i.i.d. setting, the distributions of the sampled pay-offs may not have the same characteristics as those of the original bandit arms. In particular, the φ-mixing property does not necessarily carry over. This is overcome by carefully controlling the effect of a sampling policy on the pay-off distributions. Some of the proof techniques developed in this paper can be more generally used in the context of online sampling under dependence. Proposed algorithms are accompanied with corresponding regret analysis.",

author = "Steffen Grunewalder and Azadeh Khaleghi",

year = "2019",

month = feb,

day = "1",

language = "English",

volume = "20",

pages = "1--37",

journal = "Journal of Machine Learning Research",

issn = "1532-4435",

publisher = "Microtome Publishing",

number = "14",

}

RIS

TY - JOUR

T1 - Approximations of the Restless Bandit Problem

AU - Grunewalder, Steffen

AU - Khaleghi, Azadeh

PY - 2019/2/1

Y1 - 2019/2/1

N2 - The multi-armed restless bandit problem is studied in the case where the pay-off distributions are stationary φ-mixing. This version of the problem provides a more realistic model for most real-world applications, but cannot be optimally solved in practice, since it is known to be PSPACE-hard. The objective of this paper is to characterize a sub-class of the problem where good approximate solutions can be found using tractable approaches. Specifically, it is shown that under some conditions on the φ-mixing coefficients, a modified version of UCB can prove effective. The main challenge is that, unlike in the i.i.d. setting, the distributions of the sampled pay-offs may not have the same characteristics as those of the original bandit arms. In particular, the φ-mixing property does not necessarily carry over. This is overcome by carefully controlling the effect of a sampling policy on the pay-off distributions. Some of the proof techniques developed in this paper can be more generally used in the context of online sampling under dependence. Proposed algorithms are accompanied with corresponding regret analysis.

AB - The multi-armed restless bandit problem is studied in the case where the pay-off distributions are stationary φ-mixing. This version of the problem provides a more realistic model for most real-world applications, but cannot be optimally solved in practice, since it is known to be PSPACE-hard. The objective of this paper is to characterize a sub-class of the problem where good approximate solutions can be found using tractable approaches. Specifically, it is shown that under some conditions on the φ-mixing coefficients, a modified version of UCB can prove effective. The main challenge is that, unlike in the i.i.d. setting, the distributions of the sampled pay-offs may not have the same characteristics as those of the original bandit arms. In particular, the φ-mixing property does not necessarily carry over. This is overcome by carefully controlling the effect of a sampling policy on the pay-off distributions. Some of the proof techniques developed in this paper can be more generally used in the context of online sampling under dependence. Proposed algorithms are accompanied with corresponding regret analysis.

M3 - Journal article

VL - 20

SP - 1

EP - 37

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

IS - 14

ER -

Research

Associated organisational units

Electronic data

Links