This thesis is concerned with the study of sequential decision problems motivated by the challenge of selecting questions to give to students in an online educational environment. In online education there is the potential to develop personalized and adaptive learning environments, where students can receive individualized sequences of questions which update as the student is observed to be struggling or flourishing. In order to achieve this personalization, we must learn about how good each question is, while simultaneously giving students good questions. Multi-armed bandits are a popular technique for sequential decision making under uncertainty. Due to their online nature and their ability to balance the trade-off between exploitation and exploration, multi-armed bandits lend themselves naturally to this problem of adaptively selecting questions in education software. However, due to the complexity of the educational problem, standard approaches to multi-armed bandits cannot be applied directly. In this thesis variants of the multi-armed bandit problem specifically motivated by the issues arising in the educational domain are considered. Particular focus will be placed on ton the statistical and mathematical foundations of such approaches.