Home > Research > Publications & Outputs > Bayesian Reinforcement Learning in Markovian an...

Links

Text available via DOI:

View graph of relations

Bayesian Reinforcement Learning in Markovian and non-Markovian Tasks

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Bayesian Reinforcement Learning in Markovian and non-Markovian Tasks. / Ez-Zizi, Adnane; Farrell, Simon; Leslie, David Stuart.
Computational Intelligence, 2015 IEEE Symposium Series on . Cape Town: IEEE, 2015. p. 579-586.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Ez-Zizi, A, Farrell, S & Leslie, DS 2015, Bayesian Reinforcement Learning in Markovian and non-Markovian Tasks. in Computational Intelligence, 2015 IEEE Symposium Series on . IEEE, Cape Town, pp. 579-586. https://doi.org/10.1109/SSCI.2015.91

APA

Ez-Zizi, A., Farrell, S., & Leslie, D. S. (2015). Bayesian Reinforcement Learning in Markovian and non-Markovian Tasks. In Computational Intelligence, 2015 IEEE Symposium Series on (pp. 579-586). IEEE. https://doi.org/10.1109/SSCI.2015.91

Vancouver

Ez-Zizi A, Farrell S, Leslie DS. Bayesian Reinforcement Learning in Markovian and non-Markovian Tasks. In Computational Intelligence, 2015 IEEE Symposium Series on . Cape Town: IEEE. 2015. p. 579-586 doi: 10.1109/SSCI.2015.91

Author

Ez-Zizi, Adnane ; Farrell, Simon ; Leslie, David Stuart. / Bayesian Reinforcement Learning in Markovian and non-Markovian Tasks. Computational Intelligence, 2015 IEEE Symposium Series on . Cape Town : IEEE, 2015. pp. 579-586

Bibtex

@inproceedings{c5449bfd9b04424db97ee78223db8866,
title = "Bayesian Reinforcement Learning in Markovian and non-Markovian Tasks",
abstract = "We present a Bayesian reinforcement learning model with a working memory module which can solve some non-Markovian decision processes. The model is tested, and compared against SARSA (lambda), on a standard working-memory task from the psychology literature. Our method uses the Kalman temporal difference framework, And its extension to stochastic state transitions, to give posterior distributions over state-action values. This framework provides a natural mechanism for using reward information to update more than the current state-action pair, and thus negates the use of eligibility traces. Furthermore, the existence of full posterior distributions allows the use of Thompson sampling for action selection, which in turn removes the need to choose an appropriately parameterised action-selection method.",
author = "Adnane Ez-Zizi and Simon Farrell and Leslie, {David Stuart}",
year = "2015",
month = dec,
day = "7",
doi = "10.1109/SSCI.2015.91",
language = "English",
isbn = "9781479975600 ",
pages = "579--586",
booktitle = "Computational Intelligence, 2015 IEEE Symposium Series on",
publisher = "IEEE",

}

RIS

TY - GEN

T1 - Bayesian Reinforcement Learning in Markovian and non-Markovian Tasks

AU - Ez-Zizi, Adnane

AU - Farrell, Simon

AU - Leslie, David Stuart

PY - 2015/12/7

Y1 - 2015/12/7

N2 - We present a Bayesian reinforcement learning model with a working memory module which can solve some non-Markovian decision processes. The model is tested, and compared against SARSA (lambda), on a standard working-memory task from the psychology literature. Our method uses the Kalman temporal difference framework, And its extension to stochastic state transitions, to give posterior distributions over state-action values. This framework provides a natural mechanism for using reward information to update more than the current state-action pair, and thus negates the use of eligibility traces. Furthermore, the existence of full posterior distributions allows the use of Thompson sampling for action selection, which in turn removes the need to choose an appropriately parameterised action-selection method.

AB - We present a Bayesian reinforcement learning model with a working memory module which can solve some non-Markovian decision processes. The model is tested, and compared against SARSA (lambda), on a standard working-memory task from the psychology literature. Our method uses the Kalman temporal difference framework, And its extension to stochastic state transitions, to give posterior distributions over state-action values. This framework provides a natural mechanism for using reward information to update more than the current state-action pair, and thus negates the use of eligibility traces. Furthermore, the existence of full posterior distributions allows the use of Thompson sampling for action selection, which in turn removes the need to choose an appropriately parameterised action-selection method.

U2 - 10.1109/SSCI.2015.91

DO - 10.1109/SSCI.2015.91

M3 - Conference contribution/Paper

SN - 9781479975600

SP - 579

EP - 586

BT - Computational Intelligence, 2015 IEEE Symposium Series on

PB - IEEE

CY - Cape Town

ER -