Stochastic Models for Dynamic Resource Allocation

Management Science

Associated organisational units

Electronic data

2018yarahmadiphd
Final published version, 4.28 MB, PDF document

Text available via DOI:

https://doi.org/10.17635/lancaster/thesis/1924
Final published version

View graph of relations

Research output: Thesis › Doctoral Thesis

Published

Amin Yarahmadi

More...

Publication date	22/02/2023
Number of pages	197
Qualification	PhD
Awarding Institution	Lancaster University
Supervisors/Advisors	Jacko, Peter, Supervisor Glazebrook, Kevin, Supervisor
Award date	19/12/2022
Publisher	Lancaster University
<mark>Original language</mark>	English

Abstract

Determining the efficacy of a novel intervention is vital before making it available to the public. The standard equal fixed randomisation procedure in the design of (static) experiments leads to an unbiased Maximum Likelihood Estimator (MLE) for each intervention. However, this approach results in a heavily suboptimal cumulative reward. On the other hand, it imposes limitations in some situations, especially for rare diseases, when it is desirable to design a clinical trial on a small number of subjects while treating them as well as possible. This motivates the use of response-adaptive procedures where the allocation ratios to each arm can be skewed toward the better-performing intervention as subject responses become available. Hence, we consider the Bayesian Beta-Bernoulli finite-horizon two-armed bandit problem with binary responses and the objective function of maximising the Bayes-expected total number of subject successes in the trial, which we call the subject benefit.

Using a memory-efficient implementation, dynamic programming is utilised as the solution method for the proposed model to derive the randomised designs. Despite the type of randomisation procedure, the MLE is estimated in a frequentist way using DP-based solutions at the end of the trial.

We first evaluate the bias of MLE and show that it is unacceptably high and variable due to the model's adaptiveness. We propose a new augmented estimator with the aim of mitigating the estimation bias whilst the DP actions are deterministic. Moreover, by modifying the allocation decision at every time step, we introduce two novel allocation procedures that mitigate the bias induced by the DP procedure: (i) DP using an augmented estimator, which adds a number of pseudo-successes to the worse-performing intervention, and (ii) randomised DP procedure, which perturbs the Bayes-optimal allocation decision with a given probability.

Lastly, another DP design is proposed based upon setting an interim analysis, in which some novel and non-trivial stopping criteria have been developed, in the middle of the trial. The interim analysis look can be implemented in the simulation step or both the DP procedure and the simulation step, identically.

We evaluated the proposed designs via extensive simulation studies in a broad range of scenarios. This thesis addresses some key issues in the trade-off between reducing the bias in the estimation and improving the subject benefit in the bandit models, which can be considered as a limitation preventing bandit models from being implemented in practice.

Research

Associated organisational units

Electronic data

Text available via DOI:

Stochastic Models for Dynamic Resource Allocation

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us