Hybrid Safe Reinforcement Learning - Research Portal

Computing and Communications

Associated organisational units

Electronic data

Hybrid_Safe_Reinforcement_Learning__Tackling_Distribution_Shift_and_Outliers_with_the_Student_t_s_Process_pure
Accepted author manuscript, 1.05 MB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Text available via DOI:

https://doi.org/10.1016/j.neucom.2025.129912
Final published version
Available under license: CC BY: Creative Commons Attribution 4.0 International License

View graph of relations

Hybrid Safe Reinforcement Learning: Tackling Distribution Shift and Outliers with the Student-t’s Process

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Hybrid Safe Reinforcement Learning: Tackling Distribution Shift and Outliers with the Student-t’s Process. / Hickman, Xavier ; Lu, Yang ; Prince, Daniel.
In: Neurocomputing, Vol. 634, 129912, 14.06.2025.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Bibtex

@article{427db48f535046c59a24a3c61fda29ba,

title = "Hybrid Safe Reinforcement Learning: Tackling Distribution Shift and Outliers with the Student-t{\textquoteright}s Process",

abstract = "Safe reinforcement learning (SRL) aims to optimize control policies that maximise long-term reward, while adhering to safety constraints. SRL has many real-world applications such as, autonomous vehicles, industrial robotics, and healthcare. Recent advances in offline reinforcement learning (RL) - where agents learn policies from static datasets without interacting with the environment - have made it a promising approach to derive safe control policies. However, offline RL faces significant challenges, such as covariate shift and outliers in the data, which can lead to suboptimal policies. Similarly, online SRL, which derives safe policies through real-time environment interaction, struggles with outliers and often relies on unrealistic regularity assumptions, limiting its practicality. This paper addresses these challenges by proposing a hybrid-offline-online approach. First, prior knowledge from offline learning guides online exploration. Then, during online learning, we replace the popular Gaussian Process (GP) with the Student-t's Process (TP) to enhance robustness to covariate shift and outliers. ",

author = "Xavier Hickman and Yang Lu and Daniel Prince",

year = "2025",

month = jun,

day = "14",

doi = "10.1016/j.neucom.2025.129912",

language = "English",

volume = "634",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier Science B.V.",

}

RIS

TY - JOUR

T1 - Hybrid Safe Reinforcement Learning

T2 - Tackling Distribution Shift and Outliers with the Student-t’s Process

AU - Hickman, Xavier

AU - Lu, Yang

AU - Prince, Daniel

PY - 2025/6/14

Y1 - 2025/6/14

N2 - Safe reinforcement learning (SRL) aims to optimize control policies that maximise long-term reward, while adhering to safety constraints. SRL has many real-world applications such as, autonomous vehicles, industrial robotics, and healthcare. Recent advances in offline reinforcement learning (RL) - where agents learn policies from static datasets without interacting with the environment - have made it a promising approach to derive safe control policies. However, offline RL faces significant challenges, such as covariate shift and outliers in the data, which can lead to suboptimal policies. Similarly, online SRL, which derives safe policies through real-time environment interaction, struggles with outliers and often relies on unrealistic regularity assumptions, limiting its practicality. This paper addresses these challenges by proposing a hybrid-offline-online approach. First, prior knowledge from offline learning guides online exploration. Then, during online learning, we replace the popular Gaussian Process (GP) with the Student-t's Process (TP) to enhance robustness to covariate shift and outliers.

AB - Safe reinforcement learning (SRL) aims to optimize control policies that maximise long-term reward, while adhering to safety constraints. SRL has many real-world applications such as, autonomous vehicles, industrial robotics, and healthcare. Recent advances in offline reinforcement learning (RL) - where agents learn policies from static datasets without interacting with the environment - have made it a promising approach to derive safe control policies. However, offline RL faces significant challenges, such as covariate shift and outliers in the data, which can lead to suboptimal policies. Similarly, online SRL, which derives safe policies through real-time environment interaction, struggles with outliers and often relies on unrealistic regularity assumptions, limiting its practicality. This paper addresses these challenges by proposing a hybrid-offline-online approach. First, prior knowledge from offline learning guides online exploration. Then, during online learning, we replace the popular Gaussian Process (GP) with the Student-t's Process (TP) to enhance robustness to covariate shift and outliers.

U2 - 10.1016/j.neucom.2025.129912

DO - 10.1016/j.neucom.2025.129912

M3 - Journal article

VL - 634

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

M1 - 129912

ER -

Research

Associated organisational units

Electronic data

Links

Text available via DOI:

Hybrid Safe Reinforcement Learning: Tackling Distribution Shift and Outliers with the Student-t’s Process

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS

Quick Links

Connect With Us

Faculties & Depts

Contact Us