Final published version
Licence: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Adaptive policies for perimeter surveillance problems
AU - Grant, James A.
AU - Leslie, David S.
AU - Glazebrook, Kevin
AU - Szechtman, Roberto
AU - Letchford, Adam
PY - 2020/5/16
Y1 - 2020/5/16
N2 - We consider the problem of sequentially choosing observation regions along a line, with an aim of maximising the detection of events of interest. Such a problem may arise when monitoring the movements of endangered or migratory species, detecting crossings of a border, policing activities at sea, and in many other settings. In each case, the key operational challenge is to learn an allocation of surveillance resources which maximises successful detection of events of interest. We present a combinatorial multi-armed bandit model with Poisson rewards and a novel filtered feedback mechanism - arising from the failure to detect certain intrusions - where reward distributions are dependent on the actions selected. Our solution method is an upper confidence bound approach and we derive upper and lower bounds on its expected performance. We prove that the gap between these bounds is of constant order, and demonstrate empirically that our approach is more reliable in simulated problems than competing algorithms.
AB - We consider the problem of sequentially choosing observation regions along a line, with an aim of maximising the detection of events of interest. Such a problem may arise when monitoring the movements of endangered or migratory species, detecting crossings of a border, policing activities at sea, and in many other settings. In each case, the key operational challenge is to learn an allocation of surveillance resources which maximises successful detection of events of interest. We present a combinatorial multi-armed bandit model with Poisson rewards and a novel filtered feedback mechanism - arising from the failure to detect certain intrusions - where reward distributions are dependent on the actions selected. Our solution method is an upper confidence bound approach and we derive upper and lower bounds on its expected performance. We prove that the gap between these bounds is of constant order, and demonstrate empirically that our approach is more reliable in simulated problems than competing algorithms.
KW - cs.LG
KW - stat.ML
KW - Applied probability
KW - Stochastic processes
KW - Uncertainty modelling
KW - OR in defence
U2 - 10.1016/j.ejor.2019.11.004
DO - 10.1016/j.ejor.2019.11.004
M3 - Journal article
VL - 283
SP - 265
EP - 278
JO - European Journal of Operational Research
JF - European Journal of Operational Research
SN - 0377-2217
IS - 1
ER -