On the asymptotic optimality of greedy index heuristics for multi-action restless bandits

Management Science

Associated organisational units

Electronic data

2014_HGFinal_Cut
Submitted manuscript, 249 KB, PDF document

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

On the asymptotic optimality of greedy index heuristics for multi-action restless bandits. / Hodge, David; Glazebrook, Kevin.
In: Advances in Applied Probability, Vol. 47, No. 3, 2015, p. 652-667.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Bibtex

@article{4b0388e8899343dcabb7dfa22f6070a9,

title = "On the asymptotic optimality of greedy index heuristics for multi-action restless bandits",

abstract = "The class of restless bandits as proposed by Whittle (1988) have long been known to be intractable. This paper presents an optimality result which extends that of Weber and Weiss (1990) for restless bandits to a more general setting in which individual bandits have multiple levels of activation but are subject to an overall resource constraint. The contribution is motivated by the recent works of Glazebrook et al. (2011a), (2011b) who discussed the performance of index heuristics for resource allocation in such systems. Hitherto, index heuristics have been shown, under a condition of full indexability, to be optimal for a natural Lagrangian relaxation of such problems in which a resource is purchased rather than constrained. We find that under key assumptions about the nature of solutions to a deterministic differential equation that the index heuristics above are asymptotically optimal in a sense described by Whittle. We then demonstrate that these assumptions always hold for three-state bandits. ",

author = "David Hodge and Kevin Glazebrook",

year = "2015",

language = "English",

volume = "47",

pages = "652--667",

journal = "Advances in Applied Probability",

issn = "0001-8678",

publisher = "Cambridge University Press",

number = "3",

}

RIS

TY - JOUR

T1 - On the asymptotic optimality of greedy index heuristics for multi-action restless bandits

AU - Hodge, David

AU - Glazebrook, Kevin

PY - 2015

Y1 - 2015

N2 - The class of restless bandits as proposed by Whittle (1988) have long been known to be intractable. This paper presents an optimality result which extends that of Weber and Weiss (1990) for restless bandits to a more general setting in which individual bandits have multiple levels of activation but are subject to an overall resource constraint. The contribution is motivated by the recent works of Glazebrook et al. (2011a), (2011b) who discussed the performance of index heuristics for resource allocation in such systems. Hitherto, index heuristics have been shown, under a condition of full indexability, to be optimal for a natural Lagrangian relaxation of such problems in which a resource is purchased rather than constrained. We find that under key assumptions about the nature of solutions to a deterministic differential equation that the index heuristics above are asymptotically optimal in a sense described by Whittle. We then demonstrate that these assumptions always hold for three-state bandits.

AB - The class of restless bandits as proposed by Whittle (1988) have long been known to be intractable. This paper presents an optimality result which extends that of Weber and Weiss (1990) for restless bandits to a more general setting in which individual bandits have multiple levels of activation but are subject to an overall resource constraint. The contribution is motivated by the recent works of Glazebrook et al. (2011a), (2011b) who discussed the performance of index heuristics for resource allocation in such systems. Hitherto, index heuristics have been shown, under a condition of full indexability, to be optimal for a natural Lagrangian relaxation of such problems in which a resource is purchased rather than constrained. We find that under key assumptions about the nature of solutions to a deterministic differential equation that the index heuristics above are asymptotically optimal in a sense described by Whittle. We then demonstrate that these assumptions always hold for three-state bandits.

M3 - Journal article

VL - 47

SP - 652

EP - 667

JO - Advances in Applied Probability

JF - Advances in Applied Probability

SN - 0001-8678

IS - 3

ER -

Research

Associated organisational units

Electronic data

Links