Home > Research > Publications & Outputs > Bandit Procedures for Designing Patient-Centric...


Text available via DOI:

View graph of relations

Bandit Procedures for Designing Patient-Centric Clinical Trials

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNChapter

Publication date21/09/2022
Host publicationSpringer Series in Supply Chain Management
EditorsXi Chen, Stefanus Jasin, Cong Shi
Place of PublicationLondon
PublisherSpringer Nature
Number of pages25
ISBN (electronic)9783031019265
ISBN (print)9783031019258
<mark>Original language</mark>English

Publication series

NameSpringer Series in Supply Chain Management
ISSN (Print)2365-6395
ISSN (electronic)2365-6409


Multi-armed bandit problems (MABPs) are a special type of optimal control problem that has been studied in the fields of operations research, statistics, machine learning, economics, and others. It is a framework well suited to model resource allocation under uncertainty in a wide variety of contexts. Across the existing theoretical literature, the use of bandit models to optimally design clinical trials is one of the most typical motivating application, where the word “optimally” refers to designing the so-called patient-centric trials, which would take into account the benefit of the in-trial patients and thus are by some researchers considered more ethical. Nevertheless, the resulting theory has had little influence on the actual design of clinical trials. Contrary to similar learning problems arising for instance in digital marketing where interventions can be tested on millions of users at negligible cost, clinical trials are about “small data”, as recruiting patients is remarkably expensive and (in many cases) ethically challenging. In this book chapter, we review a variety of operations research and machine learning approaches that lead to algorithms to “solve” the finite-horizon MABP and then interpret them in the context of designing clinical trials. Due to the focus on small sizes, we do not resort to the use of the normal distribution to approximate a binomial distribution which is a common practice for large samples either “for simplicity” or “for ease of computation”. Solving a MABP essentially means to derive a response-adaptive procedure for allocating patients to arms in a finite sample experiment with no early stopping. We evaluate and compare the performance of these procedures, including the traditional and still dominant clinical trial design choice: equal fixed randomization. Our results illustrate how bandit approaches could offer significant advantages, mainly in terms of allocating more patients to better interventions, but still pose important inferential challenges, particularly in terms of their resulting lower statistical power, potential for bias in estimation and existence of closed-form test distributions or asymptotic theory. We illustrate some promising modifications to bandit procedures to address power and bias issues, and we reflect upon the open challenges that remain for an increased uptake of bandit models in clinical trials.

Bibliographic note

Publisher Copyright: © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.