- opstok
Accepted author manuscript, 423 KB, PDF document

Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

- http://proceedings.mlr.press/v54/pike-burke17a.html
Final published version

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. 2017. p. 1114-1122 (Proceedings of Machine Learning Research; Vol. 54).

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Pike-Burke, C & Grunewalder, S 2017, Optimistic planning for the stochastic knapsack problem. in *Proceedings of the 20th International Conference on Artificial Intelligence and Statistics.* Proceedings of Machine Learning Research, vol. 54, pp. 1114-1122. <http://proceedings.mlr.press/v54/pike-burke17a.html>

Pike-Burke, C., & Grunewalder, S. (2017). Optimistic planning for the stochastic knapsack problem. In *Proceedings of the 20th International Conference on Artificial Intelligence and Statistics *(pp. 1114-1122). (Proceedings of Machine Learning Research; Vol. 54). http://proceedings.mlr.press/v54/pike-burke17a.html

Pike-Burke C, Grunewalder S. Optimistic planning for the stochastic knapsack problem. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. 2017. p. 1114-1122. (Proceedings of Machine Learning Research).

@inproceedings{ee1857e440ba4134a3730d2911536ce6,

title = "Optimistic planning for the stochastic knapsack problem",

abstract = "The stochastic knapsack problem is a stochastic resource allocation problem that arises frequently and yet is exceptionally hard to solve. We derive and study an optimistic planning algorithm specifically designed for the stochastic knapsack problem. Unlike other optimistic planning algorithms for MDPs, our algorithm, OpStoK, avoids the use of discounting and is adaptive to the amount of resources available. We achieve this behavior by means of a concentration inequality that simultaneously applies to capacity and reward estimates. Crucially, we are able to guarantee that the aforementioned confidence regions hold collectively over all time steps by an application of Doob{\textquoteright}s inequality. We demonstrate that the method returns an εε-optimal solution to the stochastic knapsack problem with high probability. To the best of our knowledge, our algorithm is the first which provides such guarantees for the stochastic knapsack problem. Furthermore, our algorithm is an anytime algorithm and will return a good solution even if stopped prematurely. This is particularly important given the difficulty of the problem. We also provide theoretical conditions to guarantee OpStoK does not expand all policies and demonstrate favorable performance in a simple experimental setting.",

author = "Ciara Pike-Burke and Steffen Grunewalder",

year = "2017",

month = apr,

day = "20",

language = "English",

series = "Proceedings of Machine Learning Research",

pages = "1114--1122",

booktitle = "Proceedings of the 20th International Conference on Artificial Intelligence and Statistics",

}

TY - GEN

T1 - Optimistic planning for the stochastic knapsack problem

AU - Pike-Burke, Ciara

AU - Grunewalder, Steffen

PY - 2017/4/20

Y1 - 2017/4/20

N2 - The stochastic knapsack problem is a stochastic resource allocation problem that arises frequently and yet is exceptionally hard to solve. We derive and study an optimistic planning algorithm specifically designed for the stochastic knapsack problem. Unlike other optimistic planning algorithms for MDPs, our algorithm, OpStoK, avoids the use of discounting and is adaptive to the amount of resources available. We achieve this behavior by means of a concentration inequality that simultaneously applies to capacity and reward estimates. Crucially, we are able to guarantee that the aforementioned confidence regions hold collectively over all time steps by an application of Doob’s inequality. We demonstrate that the method returns an εε-optimal solution to the stochastic knapsack problem with high probability. To the best of our knowledge, our algorithm is the first which provides such guarantees for the stochastic knapsack problem. Furthermore, our algorithm is an anytime algorithm and will return a good solution even if stopped prematurely. This is particularly important given the difficulty of the problem. We also provide theoretical conditions to guarantee OpStoK does not expand all policies and demonstrate favorable performance in a simple experimental setting.

AB - The stochastic knapsack problem is a stochastic resource allocation problem that arises frequently and yet is exceptionally hard to solve. We derive and study an optimistic planning algorithm specifically designed for the stochastic knapsack problem. Unlike other optimistic planning algorithms for MDPs, our algorithm, OpStoK, avoids the use of discounting and is adaptive to the amount of resources available. We achieve this behavior by means of a concentration inequality that simultaneously applies to capacity and reward estimates. Crucially, we are able to guarantee that the aforementioned confidence regions hold collectively over all time steps by an application of Doob’s inequality. We demonstrate that the method returns an εε-optimal solution to the stochastic knapsack problem with high probability. To the best of our knowledge, our algorithm is the first which provides such guarantees for the stochastic knapsack problem. Furthermore, our algorithm is an anytime algorithm and will return a good solution even if stopped prematurely. This is particularly important given the difficulty of the problem. We also provide theoretical conditions to guarantee OpStoK does not expand all policies and demonstrate favorable performance in a simple experimental setting.

M3 - Conference contribution/Paper

T3 - Proceedings of Machine Learning Research

SP - 1114

EP - 1122

BT - Proceedings of the 20th International Conference on Artificial Intelligence and Statistics

ER -