Filters








3,497 Hits in 1.9 sec

Conservative Q-Learning for Offline Reinforcement Learning [article]

Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine
2020 arXiv   pre-print
Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications.  ...  In this paper, we propose conservative Q-learning (CQL), which aims to address these limitations by learning a conservative Q-function such that the expected value of a policy under this Q-function lower-bounds  ...  Acknowledgements We thank Mohammad Norouzi, Oleh Rybkin, Anton Raichuk, Vitchyr Pong and anonymous reviewers from the Robotic AI and Learning Lab at UC Berkeley for their feedback on an earlier version  ... 
arXiv:2006.04779v3 fatcat:2gcdje7tjbektjmieswjdzcrdu

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems [article]

Sergey Levine, Aviral Kumar, George Tucker, Justin Fu
2020 arXiv   pre-print
Offline reinforcement learning algorithms hold tremendous promise for making it possible to turn large datasets into powerful decision making engines.  ...  In this tutorial article, we aim to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcement learning algorithms that utilize  ...  Offline Reinforcement Learning via Dynamic Programming Dynamic programming methods, such as Q-learning algorithms, in principle can offer a more attractive option for offline reinforcement learning as  ... 
arXiv:2005.01643v3 fatcat:kyw5xc4dijgz3dpuytnbcrmlam

Dealing with the Unknown: Pessimistic Offline Reinforcement Learning [article]

Jinning Li, Chen Tang, Masayoshi Tomizuka, Wei Zhan
2021 arXiv   pre-print
However, if we change the RL scheme to offline setting where the agent can only update its policy via static datasets, one of the major issues in offline reinforcement learning emerges, i.e. distributional  ...  We propose a Pessimistic Offline Reinforcement Learning (PessORL) algorithm to actively lead the agent back to the area where it is familiar by manipulating the value function.  ...  Recently, offline reinforcement learning (offline RL) has emerged as a promising candidate to overcome this barrier.  ... 
arXiv:2111.05440v1 fatcat:l3g246bitzdphludmk3tmiuwsy

Offline Deep Reinforcement Learning for Dynamic Pricing of Consumer Credit [article]

Raad Khraishi, Ramin Okhrati
2022 arXiv   pre-print
We introduce a method for pricing consumer credit using recent advances in offline deep reinforcement learning.  ...  Using both real and synthetic data on consumer credit applications, we demonstrate that our approach using the conservative Q-Learning algorithm is capable of learning an effective personalized pricing  ...  We thank Greig Cowan, Graham Smith, and Zachery Anderson for their valuable feedback and support. We would also like to thank Devesh Batra for his feedback on earlier drafts.  ... 
arXiv:2203.03003v1 fatcat:mavwkh5wm5fuxeqqnbbg2bmmqu

AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [article]

Ashvin Nair, Abhishek Gupta, Murtaza Dalal, Sergey Levine
2021 arXiv   pre-print
Reinforcement learning (RL) provides an appealing formalism for learning control policies from experience.  ...  offline data and actually continue to improve it further with online RL.  ...  For data-driven reinforcement learning, offline datasets consist of trajectories of states, actions and associated rewards.  ... 
arXiv:2006.09359v6 fatcat:yvtbbrzrmvfrpeibmwdu2h3j5q

Boosting Offline Reinforcement Learning with Residual Generative Modeling [article]

Hua Wei, Deheng Ye, Zhao Liu, Hao Wu, Bo Yuan, Qiang Fu, Wei Yang, Zhenhui Li
2021 arXiv   pre-print
We propose AQL (action-conditioned Q-learning), a residual generative model to reduce policy approximation error for offline RL.  ...  Offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration.  ...  ., 2019], and Conservative Q-Learning (CQL (H)) [Kumar et al., 2020] .  ... 
arXiv:2106.10411v2 fatcat:ybtjgjrlqffvro4yi7h6ryt5hm

S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning [article]

Samarth Sinha, Ajay Mandlekar, Animesh Garg
2021 arXiv   pre-print
Offline reinforcement learning proposes to learn policies from large collected datasets without interacting with the physical environment.  ...  We then combine the best data performing augmentation scheme with a state-of-the-art Q-learning technique, and improve the function approximation of the Q-networks by smoothening out the learned state-action  ...  approximation for Q-learning algorithms in offline RL.  ... 
arXiv:2103.06326v2 fatcat:6wfkk6s765eyrcrhxh4pd32wri

Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills [article]

Yevgen Chebotar, Karol Hausman, Yao Lu, Ted Xiao, Dmitry Kalashnikov, Jake Varley, Alex Irpan, Benjamin Eysenbach, Ryan Julian, Chelsea Finn, Sergey Levine
2021 arXiv   pre-print
We employ goal-conditioned Q-learning with hindsight relabeling and develop several techniques that enable training in a particularly challenging offline setting.  ...  increasingly important for scaling robot learning by reusing past robotic data.  ...  media, Julian Ibarz, Kanishka Rao, and Vincent Vanhoucke for their managerial support, and all of the Robotics at Google team for their support throughout this project.  ... 
arXiv:2104.07749v3 fatcat:ynn6s4rshfhcvowpemmqguf2em

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification [article]

Ling Pan, Longbo Huang, Tengyu Ma, Huazhe Xu
2022 arXiv   pre-print
Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets.  ...  Surprisingly, we empirically observe that conservative offline RL algorithms do not work well in the multi-agent setting -- the performance degrades significantly with an increasing number of agents.  ...  Acknowledgements We thank Bei Peng for help with results of MATD3 in the StarCraft II micromanagement benchmark.  ... 
arXiv:2111.11188v3 fatcat:7jza74xvo5bp3pspl2onykwzxm

BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning [article]

Chi Zhang, Sanmukh Rao Kuppannagari, Viktor K Prasanna
2021 arXiv   pre-print
In this paper, we improve the behavior regularized offline reinforcement learning and propose BRAC+.  ...  The goal of Offline Reinforcement Learning is to address this problem by learning effective policies using previously collected datasets.  ...  We would like to thank Chen-Yu Wei for his help in the mathematical derivation.  ... 
arXiv:2110.00894v1 fatcat:v7lztgrn4vhe3fnb45cngyh3fm

Compressive Features in Offline Reinforcement Learning for Recommender Systems [article]

Hung Nguyen, Minh Nguyen, Long Pham, Jennifer Adorno Nieves
2021 arXiv   pre-print
Our Q-learning-based system is then trained from the processed offline data set.  ...  Our approach is built on a reinforcement learning-based technique and is trained on an offline data set that is publicly available on an IEEE Big Data Cup challenge.  ...  The four deep reinforcement learning methods are Deep Q-Network (DQN), Double Deep Q-Network (Double DQN), Batch Constrainted Deep Q-Learning (BCQ), and Conservative Q-Learning (CQL).  ... 
arXiv:2111.08817v1 fatcat:mf4es4u4e5fsvf72nzfx4ch3vi

Offline Reinforcement Learning as Anti-Exploration [article]

Shideh Rezaeifar, Robert Dadashi, Nino Vieillard, Léonard Hussenot, Olivier Bachem, Olivier Pietquin, Matthieu Geist
2021 arXiv   pre-print
Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system.  ...  We thus take inspiration from the literature on bonus-based exploration to design a new offline RL agent.  ...  Conservative q-learning for offline reinforcement learning. Neural Information Processing Systems (NeurIPS), 2020. [29] M. G. Lagoudakis and R. Parr. Least-squares policy iteration.  ... 
arXiv:2106.06431v1 fatcat:crppp6covnc7plgf6uttkwyili

Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information [article]

Jin Li, Xianyuan Zhan, Zixu Xiao, Guyue Zhou
2021 arXiv   pre-print
The use of demonstration data also allows "warming-up" the RL policies using offline data with imitation learning or the recently emerged offline reinforcement learning algorithms.  ...  However, existing works often treat offline policy learning and online exploration as two separate processes, which are often accompanied by severe performance drop during the offline-to-online transition  ...  Other model-free offline RL algorithms modifies Q-function training objective to learn a conservative, underestimated Q-function [35] , [36] .  ... 
arXiv:2110.10905v1 fatcat:rtyxzxfzizc57of525nvbck6aa

Offline Inverse Reinforcement Learning [article]

Firas Jarboui, Vianney Perchet
2021 arXiv   pre-print
Current solutions either solve a behaviour cloning problem (which does not leverage the exploratory data) or a reinforced imitation learning problem (using a fixed cost function that discriminates available  ...  The objective of offline RL is to learn optimal policies when a fixed exploratory demonstrations data-set is available and sampling additional observations is impossible (typically if this operation is  ...  CAMERON: Conservative Adversarial Maximum-Entropy inverse Reinforcement learning in Offline settings with Neural network approximators Now that we introduced the building blocks of offline inverse reinforcement  ... 
arXiv:2106.05068v1 fatcat:xt4rnb6zmze4jca42j2a6ilupy

Value Penalized Q-Learning for Recommender Systems [article]

Chengqian Gao, Ke Xu, Peilin Zhao
2021 arXiv   pre-print
Scaling reinforcement learning (RL) to recommender systems (RS) is promising since maximizing the expected cumulative rewards for RL agents meets the objective of RS, i.e., improving customers' long-term  ...  To alleviate the action distribution shift problem in extracting RL policy from static trajectories, we propose Value Penalized Q-learning (VPQ), an uncertainty-based offline RL algorithm.  ...  Estimate Accurate/Conservative Q-values.  ... 
arXiv:2110.07923v1 fatcat:sxyg7ef3xbfilarpxy2ahc24bq
« Previous Showing results 1 — 15 out of 3,497 results