Optimality of LSTD and its relation to MC

School Of Mathematical Sciences

Associated organisational units

Text available via DOI:

https://doi.org/10.1109/IJCNN.2007.4370979
Final published version

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Optimality of LSTD and its relation to MC. / Grunewalder, Steffen; Hochreiter, Sepp; Obermayer, Klaus.
International Joint Conference on Neural Networks, 2007. IJCNN 2007. IEEE, 2007.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Grunewalder, S, Hochreiter, S & Obermayer, K 2007, Optimality of LSTD and its relation to MC. in International Joint Conference on Neural Networks, 2007. IJCNN 2007. IEEE. https://doi.org/10.1109/IJCNN.2007.4370979

APA

Grunewalder, S., Hochreiter, S., & Obermayer, K. (2007). Optimality of LSTD and its relation to MC. In International Joint Conference on Neural Networks, 2007. IJCNN 2007 IEEE. https://doi.org/10.1109/IJCNN.2007.4370979

Vancouver

Grunewalder S, Hochreiter S, Obermayer K. Optimality of LSTD and its relation to MC. In International Joint Conference on Neural Networks, 2007. IJCNN 2007. IEEE. 2007 doi: 10.1109/IJCNN.2007.4370979

Author

Grunewalder, Steffen ; Hochreiter, Sepp ; Obermayer, Klaus. / Optimality of LSTD and its relation to MC. International Joint Conference on Neural Networks, 2007. IJCNN 2007. IEEE, 2007.

Bibtex

@inproceedings{c0c102675ed1402ab87942f56e504e0e,

title = "Optimality of LSTD and its relation to MC",

abstract = "In this analytical study we compare the risk of the Monte Carlo (MC) and the least-squares TD (LSTD) estimator. We prove that for the case of acyclic Markov Reward Processes (MRPs) LSTD has minimal risk for any convex loss function in the class of unbiased estimators. When comparing the Monte Carlo estimator, which does not assume a Markov structure, and LSTD, we find that the Monte Carlo estimator is equivalent to LSTD if both estimators have the same amount of information. Theoretical results are supported by an empirical evaluation of the estimators.",

author = "Steffen Grunewalder and Sepp Hochreiter and Klaus Obermayer",

year = "2007",

doi = "10.1109/IJCNN.2007.4370979",

language = "English",

isbn = "9781424413799",

booktitle = "International Joint Conference on Neural Networks, 2007. IJCNN 2007",

publisher = "IEEE",

}

RIS

TY - GEN

T1 - Optimality of LSTD and its relation to MC

AU - Grunewalder, Steffen

AU - Hochreiter, Sepp

AU - Obermayer, Klaus

PY - 2007

Y1 - 2007

N2 - In this analytical study we compare the risk of the Monte Carlo (MC) and the least-squares TD (LSTD) estimator. We prove that for the case of acyclic Markov Reward Processes (MRPs) LSTD has minimal risk for any convex loss function in the class of unbiased estimators. When comparing the Monte Carlo estimator, which does not assume a Markov structure, and LSTD, we find that the Monte Carlo estimator is equivalent to LSTD if both estimators have the same amount of information. Theoretical results are supported by an empirical evaluation of the estimators.

AB - In this analytical study we compare the risk of the Monte Carlo (MC) and the least-squares TD (LSTD) estimator. We prove that for the case of acyclic Markov Reward Processes (MRPs) LSTD has minimal risk for any convex loss function in the class of unbiased estimators. When comparing the Monte Carlo estimator, which does not assume a Markov structure, and LSTD, we find that the Monte Carlo estimator is equivalent to LSTD if both estimators have the same amount of information. Theoretical results are supported by an empirical evaluation of the estimators.

U2 - 10.1109/IJCNN.2007.4370979

DO - 10.1109/IJCNN.2007.4370979

M3 - Conference contribution/Paper

SN - 9781424413799

BT - International Joint Conference on Neural Networks, 2007. IJCNN 2007

PB - IEEE

ER -

Research

Associated organisational units

Links

Text available via DOI: