Home > Research > Publications & Outputs > Filtered Poisson process bandit on a continuum

Electronic data

  • FPPBanditEJOR-9

    Accepted author manuscript, 4.27 MB, PDF document

    Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Links

Text available via DOI:

View graph of relations

Filtered Poisson process bandit on a continuum

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Filtered Poisson process bandit on a continuum. / Grant, James; Szechtman, Roberto.
In: European Journal of Operational Research, Vol. 295, No. 2, 31.12.2021, p. 575-586.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Grant, J & Szechtman, R 2021, 'Filtered Poisson process bandit on a continuum', European Journal of Operational Research, vol. 295, no. 2, pp. 575-586. https://doi.org/10.1016/j.ejor.2021.03.033

APA

Vancouver

Grant J, Szechtman R. Filtered Poisson process bandit on a continuum. European Journal of Operational Research. 2021 Dec 31;295(2):575-586. Epub 2021 Mar 24. doi: 10.1016/j.ejor.2021.03.033

Author

Grant, James ; Szechtman, Roberto. / Filtered Poisson process bandit on a continuum. In: European Journal of Operational Research. 2021 ; Vol. 295, No. 2. pp. 575-586.

Bibtex

@article{ac485698b3654665b75e0f3f14cdc8ef,
title = "Filtered Poisson process bandit on a continuum",
abstract = "We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys \tilde{O}(T^(2/3)) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic factor. ",
keywords = "Applied probability, Poisson processes, Multi-armed bandit, Machine learning",
author = "James Grant and Roberto Szechtman",
year = "2021",
month = dec,
day = "31",
doi = "10.1016/j.ejor.2021.03.033",
language = "English",
volume = "295",
pages = "575--586",
journal = "European Journal of Operational Research",
issn = "0377-2217",
publisher = "Elsevier Science B.V.",
number = "2",

}

RIS

TY - JOUR

T1 - Filtered Poisson process bandit on a continuum

AU - Grant, James

AU - Szechtman, Roberto

PY - 2021/12/31

Y1 - 2021/12/31

N2 - We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys \tilde{O}(T^(2/3)) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic factor.

AB - We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys \tilde{O}(T^(2/3)) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic factor.

KW - Applied probability

KW - Poisson processes

KW - Multi-armed bandit

KW - Machine learning

U2 - 10.1016/j.ejor.2021.03.033

DO - 10.1016/j.ejor.2021.03.033

M3 - Journal article

VL - 295

SP - 575

EP - 586

JO - European Journal of Operational Research

JF - European Journal of Operational Research

SN - 0377-2217

IS - 2

ER -