4,076 Hits in 4.5 sec

Conservative Data Sharing for Multi-Task Offline Reinforcement Learning [article]

Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Sergey Levine, Chelsea Finn
2021 arXiv   pre-print
To address this challenge, we develop a simple technique for data-sharing in multi-task offline RL that routes data based on the improvement over the task-specific data.  ...  However, sharing data across all tasks in multi-task offline RL performs surprisingly poorly in practice.  ...  CF is a CIFAR Fellow in the Learning in Machines and Brains program.  ... 
arXiv:2109.08128v1 fatcat:mdtzyivqmvdepkv6s3v6ycwpii

How to Leverage Unlabeled Data in Offline Reinforcement Learning [article]

Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Chelsea Finn, Sergey Levine
2022 arXiv   pre-print
Offline reinforcement learning (RL) can learn control policies from static datasets but, like standard RL methods, it requires reward annotations for every transition.  ...  How can we best leverage such unlabeled data in offline RL? One natural solution is to learn a reward function from the labeled data and use it to label the unlabeled data.  ...  Introduction Offline reinforcement learning (RL) provides the promise of a fully data-driven framework for learning performant policies.  ... 
arXiv:2202.01741v2 fatcat:qdt63dnmtneivdlvzhni64l7lq

Offline Distillation for Robot Lifelong Learning with Imbalanced Experience [article]

Wenxuan Zhou, Steven Bohez, Jan Humplik, Abbas Abdolmaleki, Dushyant Rao, Markus Wulfmeier, Tuomas Haarnoja, Nicolas Heess
2022 arXiv   pre-print
We show that the Offline Distillation Pipeline achieves better performance across all the encountered environments without affecting data collection.  ...  We investigate two challenges in such a lifelong learning setting: first, existing off-policy algorithms struggle with the trade-off between being conservative to maintain good performance in the old environment  ...  While most of the work in offline RL focuses on one task, Yu et al. (2021) studies multi-task offline RL with the goal of improving single task performance by selectively sharing the data across tasks  ... 
arXiv:2204.05893v1 fatcat:6jlcaw3ddfhfdblymlnghytaqi

TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations [article]

Shiyu Huang, Wenze Chen, Longfei Zhang, Shizhen Xu, Ziyang Li, Fengming Zhu, Deheng Ye, Ting Chen, Jun Zhu
2021 arXiv   pre-print
We then developed a distributed learning system and new offline algorithms to learn a powerful multi-agent AI from the fixed single-agent dataset.  ...  Deep reinforcement learning (DRL) has achieved super-human performance on complex video games (e.g., StarCraft II and Dota II).  ...  Baselines: CQL: CQL is an offline reinforcement learning algorithm that tries to learn conservative Q-values via adding penalties on the Q-functions.  ... 
arXiv:2110.04507v5 fatcat:h2u4yhlpvjg6jfikjtefzz4v5y

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation [article]

Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, Roberto Martín-Martín
2021 arXiv   pre-print
We also highlight opportunities for learning from human datasets, such as the ability to learn proficient policies on challenging, multi-stage tasks beyond the scope of current reinforcement learning methods  ...  Our study analyzes the most critical challenges when learning from offline human data for manipulation.  ...  Acknowledgments We would like to thank Albert Tung for helping with the RoboTurk data collection system, Jim Fan for providing timely lab cluster support, and Helen Roman for helping order items for the  ... 
arXiv:2108.03298v2 fatcat:p7fmn5qjynftbjaymffoifpf5y

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning [article]

Yiqin Yang, Xiaoteng Ma, Chenghao Li, Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, Qianchuan Zhao
2021 arXiv   pre-print
Learning from datasets without interaction with environments (Offline Learning) is an essential step to apply Reinforcement Learning (RL) algorithms in real-world scenarios.  ...  the dataset for value estimation.  ...  Conclusion In this work, we demonstrate a critical problem in multi-agent off-policy reinforcement learning with finite data, where it introduces accumulated extrapolation error in the number of agents  ... 
arXiv:2106.03400v2 fatcat:fxxuhcneafe7beuiypglcu6o5e

Model-Based Offline Meta-Reinforcement Learning with Regularization [article]

Sen Lin, Jialin Wan, Tengyu Xu, Yingbin Liang, Junshan Zhang
2022 arXiv   pre-print
Existing offline reinforcement learning (RL) methods face a few major challenges, particularly the distributional shift between the learned policy and the behavior policy.  ...  Motivated by such empirical analysis, we explore model-based offline Meta-RL with regularized Policy Optimization (MerPO), which learns a meta-model for efficient task structure inference and an informative  ...  Conservative data sharing for multi-task offline reinforcement learning. arXiv preprint arXiv:2109.08128, 2021a.  ... 
arXiv:2202.02929v1 fatcat:mfj35r2zazbnzpklwkvy2dw44u

Offline Reinforcement Learning with Reverse Model-based Imagination [article]

Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang
2021 arXiv   pre-print
These reverse imaginations provide informed data augmentation for model-free policy learning and enable conservative generalization beyond the offline dataset.  ...  In offline reinforcement learning (offline RL), one of the main challenges is to deal with the distributional shift between the learning policy and the given dataset.  ...  Acknowledgments and Disclosure of Funding The authors would like to thank the anonymous reviewers, Zhizhou Ren, Kun Xu, and Hang Su for valuable and insightful discussions and helpful suggestions.  ... 
arXiv:2110.00188v2 fatcat:n5k3r3kwknglxg6w2lz3tcyhae

Offline Decentralized Multi-Agent Reinforcement Learning [article]

Jiechuan Jiang, Zongqing Lu
2021 arXiv   pre-print
In many real-world multi-agent cooperative tasks, due to high cost and risk, agents cannot interact with the environment and collect experiences during learning, but have to learn from offline datasets  ...  Mathematically, we prove the convergence of Q-learning under the non-stationary transition probabilities after modification.  ...  Conclusion and Discussion In this paper, we proposed MABCQ for offline and fully decentralized multi-agent reinforcement learning.  ... 
arXiv:2108.01832v1 fatcat:gtyeobjxgjgrldrfnoy7f4kt7u

Multi-Task Conditional Imitation Learning for Autonomous Navigation at Crowded Intersections [article]

Zeyu Zhu, Huijing Zhao
2022 arXiv   pre-print
A multi-task conditional imitation learning framework is proposed to adapt both lateral and longitudinal control tasks for safe and efficient interaction.  ...  In recent years, great efforts have been devoted to deep imitation learning for autonomous driving control, where raw sensory inputs are directly mapped to control actions.  ...  Multi-task Learning in Computer Vision Multi-task learning [28] , [29] aims to improve learning efficiency by learning multiple complimentary tasks from shared representations.  ... 
arXiv:2202.10124v1 fatcat:xzhprx6hwzh67ekv5eokx7ddqu

Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL [article]

Catherine Cang, Aravind Rajeswaran, Pieter Abbeel, Michael Laskin
2021 arXiv   pre-print
Offline Reinforcement Learning (RL) aims to extract near-optimal policies from imperfect offline data without additional environment interactions.  ...  We investigate how to improve the performance of offline RL algorithms, its robustness to the quality of offline data, as well as its generalization capabilities.  ...  The authors thank Kevin Lu and Justin Fu for help with setting up the D4RL benchmark tasks.  ... 
arXiv:2106.09119v2 fatcat:clvqausk3fcnfdp7i6rpgzapde

Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic [article]

Zhihai Wang, Jie Wang, Qi Zhou, Bin Li, Houqiang Li
2021 arXiv   pre-print
Model-based reinforcement learning algorithms, which aim to learn a model of the environment to make decisions, are more sample efficient than their model-free counterparts.  ...  To tackle this problem, we propose the conservative model-based actor-critic (CMBAC), a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models.  ...  Acknowledgements We would like to thank all the anonymous reviewers for their insightful comments.  ... 
arXiv:2112.10504v1 fatcat:o6gzj3gbqjdqbnoxt7byfsdzp4

Reinforcement Learning in Practice: Opportunities and Challenges [article]

Yuxi Li
2022 arXiv   pre-print
In this article, we first give a brief introduction to reinforcement learning (RL), and its relationship with deep learning, machine learning and AI.  ...  Then we discuss challenges, in particular, 1) foundation, 2) representation, 3) reward, 4) exploration, 5) model, simulation, planning, and benchmarks, 6) off-policy/offline learning, 7) learning to learn  ...  Examples in this category include methods for transfer learning, multi-task learning, and few-shot learning.  ... 
arXiv:2202.11296v2 fatcat:xdtsmme22rfpfn6rgfotcspnhy

A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems [article]

Rafael Figueiredo Prudencio, Marcos R. O. A. Maximo, Esther Luna Colombini
2022 arXiv   pre-print
With the widespread adoption of deep learning, reinforcement learning (RL) has experienced a dramatic increase in popularity, scaling to previously intractable problems, such as playing complex games from  ...  Effective offline RL algorithms have a much wider range of applications than online RL, being particularly appealing for real-world applications such as education, healthcare, and robotics.  ...  Index Terms-Deep learning, reinforcement learning, offline reinforcement learning, batch reinforcement learning I.  ... 
arXiv:2203.01387v2 fatcat:euobvze7kre3fi7blalnbbgefm

Offline Reinforcement Learning from Images with Latent Space Models [article]

Rafael Rafailov, Tianhe Yu, Aravind Rajeswaran, Chelsea Finn
2020 arXiv   pre-print
Offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.  ...  However, the ability to learn directly from rich observation spaces like images is critical for real-world applications such as robotics.  ...  Acknowledgments We want to thank Suraj Nair for sharing the BEE dataset with us and his help with setting up the Panda drawer environment.  ... 
arXiv:2012.11547v1 fatcat:krdth53gm5d2ddnmrbygr3atgu
« Previous Showing results 1 — 15 out of 4,076 results