4 Hits in 4.2 sec

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [article]

Kimin Lee, Laura Smith, Pieter Abbeel
2021 arXiv   pre-print
We additionally show that pre-training our agents with unsupervised exploration substantially increases the mileage of its queries.  ...  To enable off-policy learning, we relabel all the agent's past experience when its reward model changes.  ...  We thank Abhishek Gupta, Joey Hejna, Qiyang (Colin) Li, Fangchen Liu, Olivia Watkins, and Mandi Zhao for providing helpful feedbacks and suggestions.  ... 
arXiv:2106.05091v1 fatcat:vgxklermife53h474rcy7idp2y

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning [article]

Xinran Liang, Katherine Shu, Kimin Lee, Pieter Abbeel
2022 arXiv   pre-print
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms on complex robot manipulation tasks from MetaWorld  ...  Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward engineering.  ...  We thank anonymous reviewers for critically reading the manuscript and suggesting substantial improvements.  ... 
arXiv:2205.12401v1 fatcat:57eg5ocr6bdbvcjuxuuxpdbp7q

SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning [article]

Jongjin Park, Younggyo Seo, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee
2022 arXiv   pre-print
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the state-of-the-art preference-based method on a variety of locomotion and robotic manipulation tasks.  ...  Preference-based reinforcement learning (RL) has shown potential for teaching agents to perform the target tasks without a costly, pre-defined reward function by learning the reward with a supervisor's  ...  We would like to thank Junsu Kim and anonymous reviewers for providing helpful feedbacks and suggestions in improving our paper.  ... 
arXiv:2203.10050v1 fatcat:slkpxwtngzc5ljge2e7cihbauu

B-Pref: Benchmarking Preference-Based Reinforcement Learning [article]

Kimin Lee, Laura Smith, Anca Dragan, Pieter Abbeel
2021 arXiv   pre-print
Reinforcement learning (RL) requires access to a reward function that incentivizes the right behavior, but these are notoriously hard to specify for complex tasks.  ...  Preference-based RL provides an alternative: learning policies using a teacher's preferences without pre-defined rewards, thus overcoming concerns associated with reward engineering.  ...  We thank Qiyang (Colin) Li and Olivia Watkins for providing helpful feedback and suggestions.  ... 
arXiv:2111.03026v1 fatcat:qdn6jk55lbfe7aujosk2ivcci4