The Finite-Horizon Two-Armed Bandit Problem with Binary Responses - Research Portal

Management Science

Associated organisational unit

DSI - Foundations

Electronic data

Jacko2019_survey_wp
Final published version, 1.97 MB, PDF document

Keywords

Multi-armed bandit problem, Design of sequential experiments, Bayesian decision theory, Dynamic programming, Index rules, Response-adaptive randomization

View graph of relations

The Finite-Horizon Two-Armed Bandit Problem with Binary Responses: A Multidisciplinary Survey of the History, State of the Art, and Myths

Research output: Working paper

Published

Standard

The Finite-Horizon Two-Armed Bandit Problem with Binary Responses: A Multidisciplinary Survey of the History, State of the Art, and Myths. / Jacko, Peter.
Lancaster: Lancaster University Management School, 2019. (Management Science Working Paper; Vol. 2019, No. 3).

Research output: Working paper

Bibtex

@techreport{4135009c26c84b8a9746a9ff4d5332d9,

title = "The Finite-Horizon Two-Armed Bandit Problem with Binary Responses: A Multidisciplinary Survey of the History, State of the Art, and Myths",

abstract = "In this paper we consider the two-armed bandit problem, which often naturally appears per se or as a subproblem in some multi-armed generalizations, and serves as a starting point for introducing additional problem features. The consideration of binary responses is motivated by its widespread applicability and by being one of the most studied settings. We focus on the undiscounted finite-horizon objective, which is the most relevant in many applications. We make an attempt to unify the terminology as this is different across disciplines that have considered this problem, and present a unified model cast in the Markov decision process framework, with subject responses modelled using the Bernoulli distribution, and the corresponding Beta distribution for Bayesian updating. We give an extensive account of the history and state of the art of approaches from several disciplines, including design of experiments, Bayesian decision theory, naive designs, reinforcement learning, biostatistics, and combination designs. We evaluate these designs, together with a few newly proposed, accurately computationally (using a newly written package in Julia programming language by the author) in order to compare their performance. We show that conclusions are different for moderate horizons (typical in practice) than for small horizons (typical in academic literature reporting computational results). We further list and clarify a number of myths about this problem, e.g., we show that, computationally, much larger problems can be designed to Bayes-optimality than what is commonly believed.",

keywords = "Multi-armed bandit problem, Design of sequential experiments, Bayesian decision theory, Dynamic programming, Index rules, Response-adaptive randomization",

author = "Peter Jacko",

year = "2019",

month = jun,

day = "1",

language = "English",

series = "Management Science Working Paper",

publisher = "Lancaster University Management School",

number = "3",

type = "WorkingPaper",

institution = "Lancaster University Management School",

}

RIS

TY - UNPB

T1 - The Finite-Horizon Two-Armed Bandit Problem with Binary Responses

T2 - A Multidisciplinary Survey of the History, State of the Art, and Myths

AU - Jacko, Peter

PY - 2019/6/1

Y1 - 2019/6/1

N2 - In this paper we consider the two-armed bandit problem, which often naturally appears per se or as a subproblem in some multi-armed generalizations, and serves as a starting point for introducing additional problem features. The consideration of binary responses is motivated by its widespread applicability and by being one of the most studied settings. We focus on the undiscounted finite-horizon objective, which is the most relevant in many applications. We make an attempt to unify the terminology as this is different across disciplines that have considered this problem, and present a unified model cast in the Markov decision process framework, with subject responses modelled using the Bernoulli distribution, and the corresponding Beta distribution for Bayesian updating. We give an extensive account of the history and state of the art of approaches from several disciplines, including design of experiments, Bayesian decision theory, naive designs, reinforcement learning, biostatistics, and combination designs. We evaluate these designs, together with a few newly proposed, accurately computationally (using a newly written package in Julia programming language by the author) in order to compare their performance. We show that conclusions are different for moderate horizons (typical in practice) than for small horizons (typical in academic literature reporting computational results). We further list and clarify a number of myths about this problem, e.g., we show that, computationally, much larger problems can be designed to Bayes-optimality than what is commonly believed.

AB - In this paper we consider the two-armed bandit problem, which often naturally appears per se or as a subproblem in some multi-armed generalizations, and serves as a starting point for introducing additional problem features. The consideration of binary responses is motivated by its widespread applicability and by being one of the most studied settings. We focus on the undiscounted finite-horizon objective, which is the most relevant in many applications. We make an attempt to unify the terminology as this is different across disciplines that have considered this problem, and present a unified model cast in the Markov decision process framework, with subject responses modelled using the Bernoulli distribution, and the corresponding Beta distribution for Bayesian updating. We give an extensive account of the history and state of the art of approaches from several disciplines, including design of experiments, Bayesian decision theory, naive designs, reinforcement learning, biostatistics, and combination designs. We evaluate these designs, together with a few newly proposed, accurately computationally (using a newly written package in Julia programming language by the author) in order to compare their performance. We show that conclusions are different for moderate horizons (typical in practice) than for small horizons (typical in academic literature reporting computational results). We further list and clarify a number of myths about this problem, e.g., we show that, computationally, much larger problems can be designed to Bayes-optimality than what is commonly believed.

KW - Multi-armed bandit problem

KW - Design of sequential experiments

KW - Bayesian decision theory

KW - Dynamic programming

KW - Index rules

KW - Response-adaptive randomization

M3 - Working paper

T3 - Management Science Working Paper

BT - The Finite-Horizon Two-Armed Bandit Problem with Binary Responses

PB - Lancaster University Management School

CY - Lancaster

ER -

Research

Associated organisational unit

Electronic data

Keywords

The Finite-Horizon Two-Armed Bandit Problem with Binary Responses: A Multidisciplinary Survey of the History, State of the Art, and Myths

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS

Quick Links

Connect With Us

Faculties & Depts

Contact Us