Optimality of LSTD and its relation to MC

Mathematics and Statistics

Associated organisational units

Text available via DOI:

https://doi.org/10.1109/IJCNN.2007.4370979
Final published version

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Steffen Grunewalder
Sepp Hochreiter
Klaus Obermayer

More...

Publication date	2007
Host publication	International Joint Conference on Neural Networks, 2007. IJCNN 2007
Publisher	IEEE
ISBN (electronic)	9781424413805
ISBN (print)	9781424413799
<mark>Original language</mark>	English

Abstract

In this analytical study we compare the risk of the Monte Carlo (MC) and the least-squares TD (LSTD) estimator. We prove that for the case of acyclic Markov Reward Processes (MRPs) LSTD has minimal risk for any convex loss function in the class of unbiased estimators. When comparing the Monte Carlo estimator, which does not assume a Markov structure, and LSTD, we find that the Monte Carlo estimator is equivalent to LSTD if both estimators have the same amount of information. Theoretical results are supported by an empirical evaluation of the estimators.

Research

Associated organisational units

Links

Text available via DOI:

Optimality of LSTD and its relation to MC

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us