Construction of approximation spaces for reinforcement learning

School Of Mathematical Sciences

Associated organisational units

Keywords

reinforcement learning, diffusion distance, proto value functions, slow feature analysis, least-squares policy iteration, visual robot navigation

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Construction of approximation spaces for reinforcement learning. / Böhmer, Wendelin; Grunewalder, Steffen; Shen, Yun et al.
In: Journal of Machine Learning Research, Vol. 14, 2013, p. 2067-2118.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Böhmer, W, Grunewalder, S, Shen, Y, Musial, M & Obermayer, K 2013, 'Construction of approximation spaces for reinforcement learning', Journal of Machine Learning Research, vol. 14, pp. 2067-2118. <http://jmlr.org/papers/v14/boehmer13a.html>

APA

Böhmer, W., Grunewalder, S., Shen, Y., Musial, M., & Obermayer, K. (2013). Construction of approximation spaces for reinforcement learning. Journal of Machine Learning Research, 14, 2067-2118. http://jmlr.org/papers/v14/boehmer13a.html

Vancouver

Böhmer W, Grunewalder S, Shen Y, Musial M, Obermayer K. Construction of approximation spaces for reinforcement learning. Journal of Machine Learning Research. 2013;14:2067-2118.

Author

Böhmer, Wendelin ; Grunewalder, Steffen ; Shen, Yun et al. / Construction of approximation spaces for reinforcement learning. In: Journal of Machine Learning Research. 2013 ; Vol. 14. pp. 2067-2118.

Bibtex

@article{5c719930d3424ff1bb90f5c1182ec35a,

title = "Construction of approximation spaces for reinforcement learning",

abstract = "Linear reinforcement learning (RL) algorithms like least-squares temporal difference learning (LSTD) require basis functions that span approximation spaces of potential value functions. This article investigates methods to construct these bases from samples. We hypothesize that an ideal approximation spaces should encode diffusion distances and that slow feature analysis (SFA) constructs such spaces. To validate our hypothesis we provide theoretical statements about the LSTD value approximation error and induced metric of approximation spaces constructed by SFA and the state-of-the-art methods Krylov bases and proto-value functions (PVF). In particular, we prove that SFA minimizes the average (over all tasks in the same environment) bound on the above approximation error. Compared to other methods, SFA is very sensitive to sampling and can sometimes fail to encode the whole state space. We derive a novel importance sampling modification to compensate for this effect. Finally, the LSTD and least squares policy iteration (LSPI) performance ofapproximation spaces constructed by Krylov bases, PVF, SFA and PCA is compared in benchmark tasks and a visual robot navigation experiment (both in a realistic simulation and with a robot). The results support our hypothesis and suggest that (i) SFA provides subspace-invariant features for MDPs with self-adjoint transition operators, which allows strong guarantees on the approximation error, (ii) the modified SFA algorithm is best suited for LSPI in both discrete and continuous state spaces and (iii) approximation spaces encoding diffusion distances facilitate LSPI performance.",

keywords = "reinforcement learning, diffusion distance, proto value functions, slow feature analysis, least-squares policy iteration, visual robot navigation",

author = "Wendelin B{\"o}hmer and Steffen Grunewalder and Yun Shen and Marek Musial and Klaus Obermayer",

year = "2013",

language = "English",

volume = "14",

pages = "2067--2118",

journal = "Journal of Machine Learning Research",

issn = "1532-4435",

publisher = "Microtome Publishing",

}

RIS

TY - JOUR

T1 - Construction of approximation spaces for reinforcement learning

AU - Böhmer, Wendelin

AU - Grunewalder, Steffen

AU - Shen, Yun

AU - Musial, Marek

AU - Obermayer, Klaus

PY - 2013

Y1 - 2013

N2 - Linear reinforcement learning (RL) algorithms like least-squares temporal difference learning (LSTD) require basis functions that span approximation spaces of potential value functions. This article investigates methods to construct these bases from samples. We hypothesize that an ideal approximation spaces should encode diffusion distances and that slow feature analysis (SFA) constructs such spaces. To validate our hypothesis we provide theoretical statements about the LSTD value approximation error and induced metric of approximation spaces constructed by SFA and the state-of-the-art methods Krylov bases and proto-value functions (PVF). In particular, we prove that SFA minimizes the average (over all tasks in the same environment) bound on the above approximation error. Compared to other methods, SFA is very sensitive to sampling and can sometimes fail to encode the whole state space. We derive a novel importance sampling modification to compensate for this effect. Finally, the LSTD and least squares policy iteration (LSPI) performance ofapproximation spaces constructed by Krylov bases, PVF, SFA and PCA is compared in benchmark tasks and a visual robot navigation experiment (both in a realistic simulation and with a robot). The results support our hypothesis and suggest that (i) SFA provides subspace-invariant features for MDPs with self-adjoint transition operators, which allows strong guarantees on the approximation error, (ii) the modified SFA algorithm is best suited for LSPI in both discrete and continuous state spaces and (iii) approximation spaces encoding diffusion distances facilitate LSPI performance.

AB - Linear reinforcement learning (RL) algorithms like least-squares temporal difference learning (LSTD) require basis functions that span approximation spaces of potential value functions. This article investigates methods to construct these bases from samples. We hypothesize that an ideal approximation spaces should encode diffusion distances and that slow feature analysis (SFA) constructs such spaces. To validate our hypothesis we provide theoretical statements about the LSTD value approximation error and induced metric of approximation spaces constructed by SFA and the state-of-the-art methods Krylov bases and proto-value functions (PVF). In particular, we prove that SFA minimizes the average (over all tasks in the same environment) bound on the above approximation error. Compared to other methods, SFA is very sensitive to sampling and can sometimes fail to encode the whole state space. We derive a novel importance sampling modification to compensate for this effect. Finally, the LSTD and least squares policy iteration (LSPI) performance ofapproximation spaces constructed by Krylov bases, PVF, SFA and PCA is compared in benchmark tasks and a visual robot navigation experiment (both in a realistic simulation and with a robot). The results support our hypothesis and suggest that (i) SFA provides subspace-invariant features for MDPs with self-adjoint transition operators, which allows strong guarantees on the approximation error, (ii) the modified SFA algorithm is best suited for LSPI in both discrete and continuous state spaces and (iii) approximation spaces encoding diffusion distances facilitate LSPI performance.

KW - reinforcement learning

KW - diffusion distance

KW - proto value functions

KW - slow feature analysis

KW - least-squares policy iteration

KW - visual robot navigation

M3 - Journal article

VL - 14

SP - 2067

EP - 2118

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -

Research

Associated organisational units

Links

Keywords