Discounted multi-armed bandit problems on a collection of machines with varying speeds

Mathematics and Statistics

Text available via DOI:

https://doi.org/10.1287/moor.1030.0068
Final published version

Keywords

average reward optimality, Blackwell optimality, Gittins index, multiarmed bandit, sensitive discount optimality

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Discounted multi-armed bandit problems on a collection of machines with varying speeds. / Glazebrook, Kevin; Dunn, R. T.
In: Mathematics of Operations Research, Vol. 29, No. 2, 2004, p. 266-279.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Bibtex

@article{2b7e177123ed4755bf98b7bf6f2bfb2e,

title = "Discounted multi-armed bandit problems on a collection of machines with varying speeds",

abstract = "This paper is the first to consider general multiarmed bandit problems on parallel machines working at different speeds. Block allocation policies make a once-for-all allocation of bandits to machines at time zero. In this class we describe how to achieve Blackwell optimality under given conditions. The block allocation policy identified allocates the bandits with the largest guaranteed reward rates to the machines operating at greatest speed. This policy is shown to be average-reward optimal in the class of general (nonanticipative, nonidling) policies.",

keywords = "average reward optimality, Blackwell optimality, Gittins index, multiarmed bandit, sensitive discount optimality",

author = "Kevin Glazebrook and Dunn, {R. T.}",

note = "RAE_import_type : Journal article RAE_uoa_type : Statistics and Operational Research",

year = "2004",

doi = "10.1287/moor.1030.0068",

language = "English",

volume = "29",

pages = "266--279",

journal = "Mathematics of Operations Research",

issn = "0364-765X",

publisher = "INFORMS Inst.for Operations Res.and the Management Sciences",

number = "2",

}

RIS

TY - JOUR

T1 - Discounted multi-armed bandit problems on a collection of machines with varying speeds

AU - Glazebrook, Kevin

AU - Dunn, R. T.

N1 - RAE_import_type : Journal article RAE_uoa_type : Statistics and Operational Research

PY - 2004

Y1 - 2004

N2 - This paper is the first to consider general multiarmed bandit problems on parallel machines working at different speeds. Block allocation policies make a once-for-all allocation of bandits to machines at time zero. In this class we describe how to achieve Blackwell optimality under given conditions. The block allocation policy identified allocates the bandits with the largest guaranteed reward rates to the machines operating at greatest speed. This policy is shown to be average-reward optimal in the class of general (nonanticipative, nonidling) policies.

AB - This paper is the first to consider general multiarmed bandit problems on parallel machines working at different speeds. Block allocation policies make a once-for-all allocation of bandits to machines at time zero. In this class we describe how to achieve Blackwell optimality under given conditions. The block allocation policy identified allocates the bandits with the largest guaranteed reward rates to the machines operating at greatest speed. This policy is shown to be average-reward optimal in the class of general (nonanticipative, nonidling) policies.

KW - average reward optimality

KW - Blackwell optimality

KW - Gittins index

KW - multiarmed bandit

KW - sensitive discount optimality

U2 - 10.1287/moor.1030.0068

DO - 10.1287/moor.1030.0068

M3 - Journal article

VL - 29

SP - 266

EP - 279

JO - Mathematics of Operations Research

JF - Mathematics of Operations Research

SN - 0364-765X

IS - 2

ER -

Research

Links

Text available via DOI:

Keywords