Exploration and exploitation in Bayes sequential decision problems

Associated organisational unit

Statistical Artificial Intelligence

Electronic data

2016edwardsphd
Final published version, 1.67 MB, PDF document
Available under license: CC BY-NC-ND

Keywords

Multi-armed bandit problem, Statistical learning, Operational research, Dynamic optimization , Statistical modelling, Dynamic programming, Gittins index, Bayesian statistics

View graph of relations

Research output: Thesis › Doctoral Thesis

Published

Standard

Exploration and exploitation in Bayes sequential decision problems. / Edwards, James.
Lancaster University, 2016. 160 p.

Research output: Thesis › Doctoral Thesis

Bibtex

@phdthesis{5768a9d3c07a417abe2c1027eb9663db,

title = "Exploration and exploitation in Bayes sequential decision problems",

abstract = "Bayes sequential decision problems are an extensive problem class with wide application. They involve taking actions in sequence in a system which has characteristics which are unknown or only partially known. These characteristics can be learnt over time as a result of our actions. Therefore we are faced with a trade-off between choosing actions that give desirable short term outcomes (exploitation) and actions that yield useful information about the system which can be used to improve longer term outcomes (exploration).Gittins indices provide an optimal method for a small but important subclass of these problems. Unfortunately the optimality of index methods does not hold generally and Gittins indices can be impractical to calculate for many problems. This has motivated the search for easy-to-calculate heuristics with general application. One such non-index method is the knowledge gradient heuristic. A thorough investigation of the method is made which identifies crucial weaknesses. Index and non-index variants are developed which avoid these weaknesses.The problem of choosing multiple website elements to present to user is an important problem relevant to many major web-based businesses. A Bayesian multi-armed bandit model is developed which captures the interactions between elements and the dual uncertainties of both user preferences and element quality. The problem has many challenging features but solution methods are proposed that are both easy to implement and which can be adapted to particular applications.Finally, easy-to-use software to calculate Gittins indices for Bernoulli and normal rewards has been developed as part of this thesis and has been made publicly available. The methodology used is presented together with a study of accuracy and speed.",

keywords = "Multi-armed bandit problem, Statistical learning, Operational research, Dynamic optimization , Statistical modelling, Dynamic programming, Gittins index, Bayesian statistics",

author = "James Edwards",

year = "2016",

language = "English",

publisher = "Lancaster University",

school = "Lancaster University",

}

RIS

TY - BOOK

T1 - Exploration and exploitation in Bayes sequential decision problems

AU - Edwards, James

PY - 2016

Y1 - 2016

N2 - Bayes sequential decision problems are an extensive problem class with wide application. They involve taking actions in sequence in a system which has characteristics which are unknown or only partially known. These characteristics can be learnt over time as a result of our actions. Therefore we are faced with a trade-off between choosing actions that give desirable short term outcomes (exploitation) and actions that yield useful information about the system which can be used to improve longer term outcomes (exploration).Gittins indices provide an optimal method for a small but important subclass of these problems. Unfortunately the optimality of index methods does not hold generally and Gittins indices can be impractical to calculate for many problems. This has motivated the search for easy-to-calculate heuristics with general application. One such non-index method is the knowledge gradient heuristic. A thorough investigation of the method is made which identifies crucial weaknesses. Index and non-index variants are developed which avoid these weaknesses.The problem of choosing multiple website elements to present to user is an important problem relevant to many major web-based businesses. A Bayesian multi-armed bandit model is developed which captures the interactions between elements and the dual uncertainties of both user preferences and element quality. The problem has many challenging features but solution methods are proposed that are both easy to implement and which can be adapted to particular applications.Finally, easy-to-use software to calculate Gittins indices for Bernoulli and normal rewards has been developed as part of this thesis and has been made publicly available. The methodology used is presented together with a study of accuracy and speed.

AB - Bayes sequential decision problems are an extensive problem class with wide application. They involve taking actions in sequence in a system which has characteristics which are unknown or only partially known. These characteristics can be learnt over time as a result of our actions. Therefore we are faced with a trade-off between choosing actions that give desirable short term outcomes (exploitation) and actions that yield useful information about the system which can be used to improve longer term outcomes (exploration).Gittins indices provide an optimal method for a small but important subclass of these problems. Unfortunately the optimality of index methods does not hold generally and Gittins indices can be impractical to calculate for many problems. This has motivated the search for easy-to-calculate heuristics with general application. One such non-index method is the knowledge gradient heuristic. A thorough investigation of the method is made which identifies crucial weaknesses. Index and non-index variants are developed which avoid these weaknesses.The problem of choosing multiple website elements to present to user is an important problem relevant to many major web-based businesses. A Bayesian multi-armed bandit model is developed which captures the interactions between elements and the dual uncertainties of both user preferences and element quality. The problem has many challenging features but solution methods are proposed that are both easy to implement and which can be adapted to particular applications.Finally, easy-to-use software to calculate Gittins indices for Bernoulli and normal rewards has been developed as part of this thesis and has been made publicly available. The methodology used is presented together with a study of accuracy and speed.

KW - Multi-armed bandit problem

KW - Statistical learning

KW - Operational research

KW - Dynamic optimization

KW - Statistical modelling

KW - Dynamic programming

KW - Gittins index

KW - Bayesian statistics

M3 - Doctoral Thesis

PB - Lancaster University

ER -

Research

Associated organisational unit

Electronic data

Keywords