A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training
[article]
2021
arXiv
pre-print
We additionally show that pre-training our agents with unsupervised exploration substantially increases the mileage of its queries. ...
To enable off-policy learning, we relabel all the agent's past experience when its reward model changes. ...
We thank Abhishek Gupta, Joey Hejna, Qiyang (Colin) Li, Fangchen Liu, Olivia Watkins, and Mandi Zhao for providing helpful feedbacks and suggestions. ...
arXiv:2106.05091v1
fatcat:vgxklermife53h474rcy7idp2y
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning
[article]
2022
arXiv
pre-print
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms on complex robot manipulation tasks from MetaWorld ...
Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward engineering. ...
We thank anonymous reviewers for critically reading the manuscript and suggesting substantial improvements. ...
arXiv:2205.12401v1
fatcat:57eg5ocr6bdbvcjuxuuxpdbp7q
SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning
[article]
2022
arXiv
pre-print
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the state-of-the-art preference-based method on a variety of locomotion and robotic manipulation tasks. ...
Preference-based reinforcement learning (RL) has shown potential for teaching agents to perform the target tasks without a costly, pre-defined reward function by learning the reward with a supervisor's ...
We would like to thank Junsu Kim and anonymous reviewers for providing helpful feedbacks and suggestions in improving our paper. ...
arXiv:2203.10050v1
fatcat:slkpxwtngzc5ljge2e7cihbauu
B-Pref: Benchmarking Preference-Based Reinforcement Learning
[article]
2021
arXiv
pre-print
Reinforcement learning (RL) requires access to a reward function that incentivizes the right behavior, but these are notoriously hard to specify for complex tasks. ...
Preference-based RL provides an alternative: learning policies using a teacher's preferences without pre-defined rewards, thus overcoming concerns associated with reward engineering. ...
We thank Qiyang (Colin) Li and Olivia Watkins for providing helpful feedback and suggestions. ...
arXiv:2111.03026v1
fatcat:qdn6jk55lbfe7aujosk2ivcci4