Discounted multi-armed bandit problems on a collection of machines with varying speeds

Home > Research > Publications & Outputs > Discounted multi-armed bandit problems on a col...

School Of Mathematical Sciences

Text available via DOI:

https://doi.org/10.1287/moor.1030.0068
Final published version

Keywords

average reward optimality, Blackwell optimality, Gittins index, multiarmed bandit, sensitive discount optimality

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Kevin Glazebrook
R. T. Dunn

More...

<mark>Journal publication date</mark>	2004
<mark>Journal</mark>	Mathematics of Operations Research
Issue number	2
Volume	29
Number of pages	14
Pages (from-to)	266-279
Publication Status	Published
<mark>Original language</mark>	English

Abstract

This paper is the first to consider general multiarmed bandit problems on parallel machines working at different speeds. Block allocation policies make a once-for-all allocation of bandits to machines at time zero. In this class we describe how to achieve Blackwell optimality under given conditions. The block allocation policy identified allocates the bandits with the largest guaranteed reward rates to the machines operating at greatest speed. This policy is shown to be average-reward optimal in the class of general (nonanticipative, nonidling) policies.

Bibliographic note

RAE_import_type : Journal article RAE_uoa_type : Statistics and Operational Research

Research

Links

Text available via DOI:

Keywords