Home > Research > Publications & Outputs > Filtered Poisson Process Bandit on a Continuum

Electronic data

Keywords

View graph of relations

Filtered Poisson Process Bandit on a Continuum

Research output: Contribution to Journal/MagazineJournal article

Published

Standard

Filtered Poisson Process Bandit on a Continuum. / Grant, James A.; Szechtman, Roberto.
In: arXiv.org, 20.07.2020.

Research output: Contribution to Journal/MagazineJournal article

Harvard

APA

Vancouver

Author

Bibtex

@article{36a58e039a1e4573b326804efb6a4e82,
title = "Filtered Poisson Process Bandit on a Continuum",
abstract = " We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys O(T^(2/3)) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic factor. ",
keywords = "cs.LG, stat.ML",
author = "Grant, {James A.} and Roberto Szechtman",
year = "2020",
month = jul,
day = "20",
language = "English",
journal = "arXiv.org",

}

RIS

TY - JOUR

T1 - Filtered Poisson Process Bandit on a Continuum

AU - Grant, James A.

AU - Szechtman, Roberto

PY - 2020/7/20

Y1 - 2020/7/20

N2 - We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys O(T^(2/3)) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic factor.

AB - We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys O(T^(2/3)) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic factor.

KW - cs.LG

KW - stat.ML

M3 - Journal article

JO - arXiv.org

JF - arXiv.org

ER -