Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Discounted multi-armed bandit problems on a collection of machines with varying speeds
AU - Glazebrook, Kevin
AU - Dunn, R. T.
N1 - RAE_import_type : Journal article RAE_uoa_type : Statistics and Operational Research
PY - 2004
Y1 - 2004
N2 - This paper is the first to consider general multiarmed bandit problems on parallel machines working at different speeds. Block allocation policies make a once-for-all allocation of bandits to machines at time zero. In this class we describe how to achieve Blackwell optimality under given conditions. The block allocation policy identified allocates the bandits with the largest guaranteed reward rates to the machines operating at greatest speed. This policy is shown to be average-reward optimal in the class of general (nonanticipative, nonidling) policies.
AB - This paper is the first to consider general multiarmed bandit problems on parallel machines working at different speeds. Block allocation policies make a once-for-all allocation of bandits to machines at time zero. In this class we describe how to achieve Blackwell optimality under given conditions. The block allocation policy identified allocates the bandits with the largest guaranteed reward rates to the machines operating at greatest speed. This policy is shown to be average-reward optimal in the class of general (nonanticipative, nonidling) policies.
KW - average reward optimality
KW - Blackwell optimality
KW - Gittins index
KW - multiarmed bandit
KW - sensitive discount optimality
U2 - 10.1287/moor.1030.0068
DO - 10.1287/moor.1030.0068
M3 - Journal article
VL - 29
SP - 266
EP - 279
JO - Mathematics of Operations Research
JF - Mathematics of Operations Research
SN - 0364-765X
IS - 2
ER -