Research output: Contribution to Journal/Magazine › Journal article

Published

In: arxiv.org, 26.05.2017.

Research output: Contribution to Journal/Magazine › Journal article

Grant, JA, Leslie, DS, Glazebrook, K & Szechtman, R 2017, 'Combinatorial Multi-Armed Bandits with Filtered Feedback', *arxiv.org*. <https://arxiv.org/abs/1705.09605>

Grant, J. A., Leslie, D. S., Glazebrook, K., & Szechtman, R. (2017). Combinatorial Multi-Armed Bandits with Filtered Feedback. *arxiv.org*. https://arxiv.org/abs/1705.09605

Grant JA, Leslie DS, Glazebrook K, Szechtman R. Combinatorial Multi-Armed Bandits with Filtered Feedback. arxiv.org. 2017 May 26.

@article{53d3c08d3163407eb3fcc10a7e5e52a9,

title = "Combinatorial Multi-Armed Bandits with Filtered Feedback",

abstract = "Motivated by problems in search and detection we present a solution to a Combinatorial Multi-Armed Bandit (CMAB) problem with both heavy-tailed reward distributions and a new class of feedback, filtered semibandit feedback. In a CMAB problem an agent pulls a combination of arms from a set $\{1,...,k\}$ in each round, generating random outcomes from probability distributions associated with these arms and receiving an overall reward. Under semibandit feedback it is assumed that the random outcomes generated are all observed. Filtered semibandit feedback allows the outcomes that are observed to be sampled from a second distribution conditioned on the initial random outcomes. This feedback mechanism is valuable as it allows CMAB methods to be applied to sequential search and detection problems where combinatorial actions are made, but the true rewards (number of objects of interest appearing in the round) are not observed, rather a filtered reward (the number of objects the searcher successfully finds, which must by definition be less than the number that appear). We present an upper confidence bound type algorithm, Robust-F-CUCB, and associated regret bound of order $\mathcal{O}(\ln(n))$ to balance exploration and exploitation in the face of both filtering of reward and heavy tailed reward distributions.",

keywords = "cs.LG, stat.ML",

author = "Grant, {James A.} and Leslie, {David S.} and Kevin Glazebrook and Roberto Szechtman",

note = "16 pages",

year = "2017",

month = may,

day = "26",

language = "English",

journal = "arxiv.org",

}

TY - JOUR

T1 - Combinatorial Multi-Armed Bandits with Filtered Feedback

AU - Grant, James A.

AU - Leslie, David S.

AU - Glazebrook, Kevin

AU - Szechtman, Roberto

N1 - 16 pages

PY - 2017/5/26

Y1 - 2017/5/26

N2 - Motivated by problems in search and detection we present a solution to a Combinatorial Multi-Armed Bandit (CMAB) problem with both heavy-tailed reward distributions and a new class of feedback, filtered semibandit feedback. In a CMAB problem an agent pulls a combination of arms from a set $\{1,...,k\}$ in each round, generating random outcomes from probability distributions associated with these arms and receiving an overall reward. Under semibandit feedback it is assumed that the random outcomes generated are all observed. Filtered semibandit feedback allows the outcomes that are observed to be sampled from a second distribution conditioned on the initial random outcomes. This feedback mechanism is valuable as it allows CMAB methods to be applied to sequential search and detection problems where combinatorial actions are made, but the true rewards (number of objects of interest appearing in the round) are not observed, rather a filtered reward (the number of objects the searcher successfully finds, which must by definition be less than the number that appear). We present an upper confidence bound type algorithm, Robust-F-CUCB, and associated regret bound of order $\mathcal{O}(\ln(n))$ to balance exploration and exploitation in the face of both filtering of reward and heavy tailed reward distributions.

AB - Motivated by problems in search and detection we present a solution to a Combinatorial Multi-Armed Bandit (CMAB) problem with both heavy-tailed reward distributions and a new class of feedback, filtered semibandit feedback. In a CMAB problem an agent pulls a combination of arms from a set $\{1,...,k\}$ in each round, generating random outcomes from probability distributions associated with these arms and receiving an overall reward. Under semibandit feedback it is assumed that the random outcomes generated are all observed. Filtered semibandit feedback allows the outcomes that are observed to be sampled from a second distribution conditioned on the initial random outcomes. This feedback mechanism is valuable as it allows CMAB methods to be applied to sequential search and detection problems where combinatorial actions are made, but the true rewards (number of objects of interest appearing in the round) are not observed, rather a filtered reward (the number of objects the searcher successfully finds, which must by definition be less than the number that appear). We present an upper confidence bound type algorithm, Robust-F-CUCB, and associated regret bound of order $\mathcal{O}(\ln(n))$ to balance exploration and exploitation in the face of both filtering of reward and heavy tailed reward distributions.

KW - cs.LG

KW - stat.ML

M3 - Journal article

JO - arxiv.org

JF - arxiv.org

ER -