Heuristic and reinforcement learning methods for admission and routing control in a queueing system with multiple heterogeneous facilities

Management Science

Associated organisational units

Research output: Contribution to conference - Without ISBN/ISSN › Abstract

Published

Standard

Heuristic and reinforcement learning methods for admission and routing control in a queueing system with multiple heterogeneous facilities. / Shone, Robert; Knight, V.A.; Harper, P.R.
2021. Abstract from 31st European Conference on Operational Research, Athens, Greece.

Research output: Contribution to conference - Without ISBN/ISSN › Abstract

Bibtex

@conference{b315514722884983b4e166a8b2ed2b6b,

title = "Heuristic and reinforcement learning methods for admission and routing control in a queueing system with multiple heterogeneous facilities",

abstract = "Methods for optimising measures of social utility in queueing systems usually require interventions to be made in order to prevent customers from pursuing their own narrow self-interest. In this talk we consider a multiple-facility queueing system with both admission and routing control. The facilities are heterogeneous, with different server capacities, service rates and cost-reward structures, and we define a “socially optimal” policy as one that maximises the long-run average net reward per unit time. Although such policies can theoretically be found using stochastic dynamic programming, examples can be found which show that they may possess surprising and counter-intuitive properties, and thus there are very few general structural properties that can be proved.It is possible, however, to show that the set of positive recurrent states under a socially optimal policy is contained within the corresponding set of states under a “selfish” policy. This has useful implications for computational algorithms, although we still encounter the “curse of dimensionality” in problems with a large number of facilities. For larger problems, we discuss the use of various heuristic methods, including those derived from optimal solutions to multi-armed bandit problems. We also consider reinforcement learning methods, in which progressive value function estimates are obtained by simulating the random evolution of the system.",

author = "Robert Shone and V.A. Knight and P.R. Harper",

year = "2021",

month = jul,

day = "14",

language = "English",

note = "31st European Conference on Operational Research ; Conference date: 11-07-2021 Through 14-07-2021",

}

RIS

TY - CONF

T1 - Heuristic and reinforcement learning methods for admission and routing control in a queueing system with multiple heterogeneous facilities

AU - Shone, Robert

AU - Knight, V.A.

AU - Harper, P.R.

N1 - Conference code: 31

PY - 2021/7/14

Y1 - 2021/7/14

N2 - Methods for optimising measures of social utility in queueing systems usually require interventions to be made in order to prevent customers from pursuing their own narrow self-interest. In this talk we consider a multiple-facility queueing system with both admission and routing control. The facilities are heterogeneous, with different server capacities, service rates and cost-reward structures, and we define a “socially optimal” policy as one that maximises the long-run average net reward per unit time. Although such policies can theoretically be found using stochastic dynamic programming, examples can be found which show that they may possess surprising and counter-intuitive properties, and thus there are very few general structural properties that can be proved.It is possible, however, to show that the set of positive recurrent states under a socially optimal policy is contained within the corresponding set of states under a “selfish” policy. This has useful implications for computational algorithms, although we still encounter the “curse of dimensionality” in problems with a large number of facilities. For larger problems, we discuss the use of various heuristic methods, including those derived from optimal solutions to multi-armed bandit problems. We also consider reinforcement learning methods, in which progressive value function estimates are obtained by simulating the random evolution of the system.

AB - Methods for optimising measures of social utility in queueing systems usually require interventions to be made in order to prevent customers from pursuing their own narrow self-interest. In this talk we consider a multiple-facility queueing system with both admission and routing control. The facilities are heterogeneous, with different server capacities, service rates and cost-reward structures, and we define a “socially optimal” policy as one that maximises the long-run average net reward per unit time. Although such policies can theoretically be found using stochastic dynamic programming, examples can be found which show that they may possess surprising and counter-intuitive properties, and thus there are very few general structural properties that can be proved.It is possible, however, to show that the set of positive recurrent states under a socially optimal policy is contained within the corresponding set of states under a “selfish” policy. This has useful implications for computational algorithms, although we still encounter the “curse of dimensionality” in problems with a large number of facilities. For larger problems, we discuss the use of various heuristic methods, including those derived from optimal solutions to multi-armed bandit problems. We also consider reinforcement learning methods, in which progressive value function estimates are obtained by simulating the random evolution of the system.

M3 - Abstract

T2 - 31st European Conference on Operational Research

Y2 - 11 July 2021 through 14 July 2021

ER -

Research

Associated organisational units