Home > Research > Publications & Outputs > Heuristic and reinforcement learning methods fo...
View graph of relations

Heuristic and reinforcement learning methods for admission and routing control in a queueing system with multiple heterogeneous facilities

Research output: Contribution to conference - Without ISBN/ISSN Abstract

Published
Close
Publication date14/07/2021
<mark>Original language</mark>English
Event31st European Conference on Operational Research - Athens, Greece
Duration: 11/07/202114/07/2021
Conference number: 31

Conference

Conference31st European Conference on Operational Research
Country/TerritoryGreece
CityAthens
Period11/07/2114/07/21

Abstract

Methods for optimising measures of social utility in queueing systems usually require interventions to be made in order to prevent customers from pursuing their own narrow self-interest. In this talk we consider a multiple-facility queueing system with both admission and routing control. The facilities are heterogeneous, with different server capacities, service rates and cost-reward structures, and we define a “socially optimal” policy as one that maximises the long-run average net reward per unit time. Although such policies can theoretically be found using stochastic dynamic programming, examples can be found which show that they may possess surprising and counter-intuitive properties, and thus there are very few general structural properties that can be proved.

It is possible, however, to show that the set of positive recurrent states under a socially optimal policy is contained within the corresponding set of states under a “selfish” policy. This has useful implications for computational algorithms, although we still encounter the “curse of dimensionality” in problems with a large number of facilities. For larger problems, we discuss the use of various heuristic methods, including those derived from optimal solutions to multi-armed bandit problems. We also consider reinforcement learning methods, in which progressive value function estimates are obtained by simulating the random evolution of the system.