Regret bounds for Gaussian process bandit problems

School Of Mathematical Sciences

Associated organisational units

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Regret bounds for Gaussian process bandit problems. / Grunewalder, S.; Audibert, J.-Y.; Opper, M. et al.
Artificial Intelligence and Statistics (AISTATS). 2010. p. 273-280.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Grunewalder, S, Audibert, J-Y, Opper, M & Shawe-Taylor, J 2010, Regret bounds for Gaussian process bandit problems. in Artificial Intelligence and Statistics (AISTATS). pp. 273-280. <http://www.jmlr.org/proceedings/papers/v9/grunewalder10a/grunewalder10a.pdf>

APA

Grunewalder, S., Audibert, J.-Y., Opper, M., & Shawe-Taylor, J. (2010). Regret bounds for Gaussian process bandit problems. In Artificial Intelligence and Statistics (AISTATS) (pp. 273-280) http://www.jmlr.org/proceedings/papers/v9/grunewalder10a/grunewalder10a.pdf

Vancouver

Grunewalder S, Audibert JY, Opper M, Shawe-Taylor J. Regret bounds for Gaussian process bandit problems. In Artificial Intelligence and Statistics (AISTATS). 2010. p. 273-280

Author

Grunewalder, S. ; Audibert, J.-Y. ; Opper, M. et al. / Regret bounds for Gaussian process bandit problems. Artificial Intelligence and Statistics (AISTATS). 2010. pp. 273-280

Bibtex

@inproceedings{855dd1c229544ab5b841a57fa45d8e29,

title = "Regret bounds for Gaussian process bandit problems",

abstract = "Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modelled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in generalthere is at most a logarithmic looseness in our upper bounds.",

author = "S. Grunewalder and J.-Y. Audibert and M. Opper and J. Shawe-Taylor",

year = "2010",

language = "English",

pages = "273--280",

booktitle = "Artificial Intelligence and Statistics (AISTATS)",

}

RIS

TY - GEN

T1 - Regret bounds for Gaussian process bandit problems

AU - Grunewalder, S.

AU - Audibert, J.-Y.

AU - Opper, M.

AU - Shawe-Taylor, J.

PY - 2010

Y1 - 2010

N2 - Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modelled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in generalthere is at most a logarithmic looseness in our upper bounds.

AB - Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modelled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in generalthere is at most a logarithmic looseness in our upper bounds.

M3 - Conference contribution/Paper

SP - 273

EP - 280

BT - Artificial Intelligence and Statistics (AISTATS)

ER -

Research

Associated organisational units

Links