Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty - Research Portal

School Of Mathematical Sciences

Associated organisational units

Electronic data

EdwardsLeslieJORS_accepted
Rights statement: This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of the Operational Research Society on 20 Feb 2019 available online:  https://www.tandfonline.com/doi/full/10.1080/01605682.2018.1546650
Accepted author manuscript, 789 KB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.1080/01605682.2018.1546650
Final published version

View graph of relations

Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty. / Leslie, David Stuart ; Edwards, James Anthony.
In: Journal of the Operational Research Society, Vol. 71, No. 1, 02.01.2020, p. 100-116.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Leslie, DS & Edwards, JA 2020, 'Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty', Journal of the Operational Research Society, vol. 71, no. 1, pp. 100-116. https://doi.org/10.1080/01605682.2018.1546650

APA

Leslie, D. S., & Edwards, J. A. (2020). Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty. Journal of the Operational Research Society, 71(1), 100-116. https://doi.org/10.1080/01605682.2018.1546650

Vancouver

Leslie DS , Edwards JA. Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty. Journal of the Operational Research Society. 2020 Jan 2;71(1):100-116. Epub 2019 Feb 20. doi: 10.1080/01605682.2018.1546650

Author

Leslie, David Stuart ; Edwards, James Anthony. / Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty. In: Journal of the Operational Research Society. 2020 ; Vol. 71, No. 1. pp. 100-116.

Bibtex

@article{0032392225704d5a84bd9bdc55257a8d,

title = "Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty",

abstract = "We present a method to solve the problem of choosing a set of adverts to display to each of a sequence of web users. The objective is to maximise user clicks over time and to do so we must learn about the quality of each advert in an online manner by observing user clicks. We formulate the problem as a novel variant of a contextual combinatorial multi-armed bandit problem. The context takes the form of a probability distribution over the user's latent topic preference, and rewards are a particular nonlinear function of the selected set and the context. These features ensure that optimal sets of adverts are appropriately diverse. We give a flexible solution method which combines submodular optimisation with existing bandit index policies. User state uncertainty creates ambiguity in interpreting user feedback which prohibits exact Bayesian updating, but we give an approximate method that is shown to work well.",

author = "Leslie, {David Stuart} and Edwards, {James Anthony}",

note = "This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of the Operational Research Society on 20 Feb 2019 available online:  https://www.tandfonline.com/doi/full/10.1080/01605682.2018.1546650 ",

year = "2020",

month = jan,

day = "2",

doi = "10.1080/01605682.2018.1546650",

language = "English",

volume = "71",

pages = "100--116",

journal = "Journal of the Operational Research Society",

issn = "0160-5682",

publisher = "Taylor and Francis Ltd.",

number = "1",

}

RIS

TY - JOUR

T1 - Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty

AU - Leslie, David Stuart

AU - Edwards, James Anthony

N1 - This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of the Operational Research Society on 20 Feb 2019 available online:  https://www.tandfonline.com/doi/full/10.1080/01605682.2018.1546650

PY - 2020/1/2

Y1 - 2020/1/2

N2 - We present a method to solve the problem of choosing a set of adverts to display to each of a sequence of web users. The objective is to maximise user clicks over time and to do so we must learn about the quality of each advert in an online manner by observing user clicks. We formulate the problem as a novel variant of a contextual combinatorial multi-armed bandit problem. The context takes the form of a probability distribution over the user's latent topic preference, and rewards are a particular nonlinear function of the selected set and the context. These features ensure that optimal sets of adverts are appropriately diverse. We give a flexible solution method which combines submodular optimisation with existing bandit index policies. User state uncertainty creates ambiguity in interpreting user feedback which prohibits exact Bayesian updating, but we give an approximate method that is shown to work well.

AB - We present a method to solve the problem of choosing a set of adverts to display to each of a sequence of web users. The objective is to maximise user clicks over time and to do so we must learn about the quality of each advert in an online manner by observing user clicks. We formulate the problem as a novel variant of a contextual combinatorial multi-armed bandit problem. The context takes the form of a probability distribution over the user's latent topic preference, and rewards are a particular nonlinear function of the selected set and the context. These features ensure that optimal sets of adverts are appropriately diverse. We give a flexible solution method which combines submodular optimisation with existing bandit index policies. User state uncertainty creates ambiguity in interpreting user feedback which prohibits exact Bayesian updating, but we give an approximate method that is shown to work well.

U2 - 10.1080/01605682.2018.1546650

DO - 10.1080/01605682.2018.1546650

M3 - Journal article

VL - 71

SP - 100

EP - 116

JO - Journal of the Operational Research Society

JF - Journal of the Operational Research Society

SN - 0160-5682

IS - 1

ER -

Research

Associated organisational units

Electronic data

Links

Text available via DOI:

Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS

Quick Links

Connect With Us

Faculties & Depts

Contact Us