3,479 Hits in 3.6 sec

Offline Reinforcement Learning with Implicit Q-Learning [article]

Ilya Kostrikov, Ashvin Nair, Sergey Levine
2021 arXiv   pre-print
We dub our method implicit Q-learning (IQL). IQL demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline reinforcement learning.  ...  Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation  ...  Due to the property discussed in Theorem 3 we dub our method implicit Q-learning (IQL).  ... 
arXiv:2110.06169v1 fatcat:alptan36azbpxme4qs5237r2lu

Human-centric Dialog Training via Offline Reinforcement Learning [article]

Natasha Jaques, Judy Hanwen Shen, Asma Ghandeharioun, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Shane Gu, Rosalind Picard
2020 arXiv   pre-print
We start by hosting models online, and gather human feedback from real-time, open-ended conversations, which we then use to train and improve the models using offline reinforcement learning (RL).  ...  We test the resulting dialog model with ratings from 80 users in an open-domain setting and find it achieves significant improvements over existing deep offline RL approaches.  ...  Deep reinforcement learning with double q- learning. In Thirtieth AAAI Conference on Artificial Intelligence. Harry Weger Jr, Gina R Castle, and Melissa C Emmett. 2010.  ... 
arXiv:2010.05848v1 fatcat:fxelzo2gubahrfjvk34jdwthfi

Offline Reinforcement Learning with Value-based Episodic Memory [article]

Xiaoteng Ma, Yiqin Yang, Hao Hu, Qihan Liu, Jun Yang, Chongjie Zhang, Qianchuan Zhao, Bin Liang
2021 arXiv   pre-print
Further, we introduce implicit planning along offline trajectories to enhance learned V-values and accelerate convergence.  ...  Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by effectively utilizing previously collected data.  ...  Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.  ... 
arXiv:2110.09796v1 fatcat:zl7dshvvbzalpauufgtnqvfxxu

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization [article]

Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine
2021 arXiv   pre-print
It is reasonable to surmise that deep reinforcement learning (RL) methods could also benefit from this effect.  ...  In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate  ...  Conservative q-learning for offline reinforcement learning. arXiv preprint arXiv:2006.04779, 2020b. Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, and Sergey Levine.  ... 
arXiv:2112.04716v1 fatcat:3jk67c5mpnathc263pzhyy47zi

Dual Behavior Regularized Reinforcement Learning [article]

Chapman Siu, Jason Traish, Richard Yi Da Xu
2021 arXiv   pre-print
Reinforcement learning has been shown to perform a range of complex tasks through interaction with an environment or collected leveraging experience.  ...  Additional ablations provide insights into how our dual behavior regularized reinforcement learning approach is designed compared with other plausible modifications and demonstrates its ability to generalize  ...  The advantage weighted actor-critic (AWAC) algorithm, trains an off-policy critic and an actor with an implicit policy constraint without the use of a behavior policy in the offline reinforcement learning  ... 
arXiv:2109.09037v1 fatcat:wou5srspaje4lc77ug5rrt3upi

IIDQN: An Incentive Improved DQN Algorithm in EBSN Recommender System

Jianan Guo, Yilei Wang, Hui An, Ming Liu, Yiting Zhang, Chunmei Li, Yuling Chen
2022 Security and Communication Networks  
To address these issues, an Incentive Improved DQN (IIDQN) based on Deep Q-Learning Networks (DQN) is proposed.  ...  Event-based Social Networks (EBSN), combining online networks with offline users, provide versatile event recommendations for offline users through complex social networks.  ...  with deep reinforcement learning in 2013 [2, 3] .  ... 
doi:10.1155/2022/7502248 fatcat:gsvifk3sdfe55jjghsu5l7ozsa

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning [article]

Yiqin Yang, Xiaoteng Ma, Chenghao Li, Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, Qianchuan Zhao
2021 arXiv   pre-print
Learning from datasets without interaction with environments (Offline Learning) is an essential step to apply Reinforcement Learning (RL) algorithms in real-world scenarios.  ...  In this paper, we propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error by only trusting the state-action pairs given in  ...  Algorithm 1 : 1 Implicit Constraint Q-Learning in Single-Agent Tasks. Input: Offline buffer B, target network update rate d.  ... 
arXiv:2106.03400v2 fatcat:fxxuhcneafe7beuiypglcu6o5e

IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data [article]

Ajay Mandlekar, Fabio Ramos, Byron Boots, Silvio Savarese, Li Fei-Fei, Animesh Garg, Dieter Fox
2020 arXiv   pre-print
In this paper, we propose Implicit Reinforcement without Interaction at Scale (IRIS), a novel framework for learning from large-scale demonstration datasets.  ...  For simple short-horizon manipulation tasks with modest variation in task instances, offline learning from a small set of demonstrations can produce controllers that successfully solve the task.  ...  We thank Kevin Shih for help with training models on image observations. We thank members of the NVIDIA Seattle Robotics Lab for several helpful discussions and feedback.  ... 
arXiv:1911.05321v2 fatcat:q6kp2tjdrreu3oefw5dyczulj4

AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [article]

Ashvin Nair, Abhishek Gupta, Murtaza Dalal, Sergey Levine
2021 arXiv   pre-print
Reinforcement learning (RL) provides an appealing formalism for learning control policies from experience.  ...  offline data and actually continue to improve it further with online RL.  ...  Effective data-driven methods for deep reinforcement learning should be able to use this data to pre-train offline while improving with online fine-tuning.  ... 
arXiv:2006.09359v6 fatcat:yvtbbrzrmvfrpeibmwdu2h3j5q

An Optimistic Perspective on Offline Reinforcement Learning [article]

Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi
2020 arXiv   pre-print
Off-policy reinforcement learning (RL) using a fixed offline dataset of logged interactions is an important consideration in real world applications.  ...  To enhance generalization in the offline setting, we present Random Ensemble Mixture (REM), a robust Q-learning algorithm that enforces optimal Bellman consistency on random convex combinations of multiple  ...  Acknowledgements We thank Pablo Samuel Castro for help in understanding and debugging issues with the Dopamine codebase and reviewing an early draft of the paper.  ... 
arXiv:1907.04543v4 fatcat:ocqec67o7zhvtlsz4sjlwvoa7e

Extension: Adaptive Sampling with Implicit Radiance Field [article]

Yuchi Huo
2022 arXiv   pre-print
This manuscript discusses the extension of adaptive light field sampling with implicit radiance fields.  ...  The Q network is trained based on deep reinforcement learning strategies to guide the sampling process of obtaining sparse samples to improve sampling. s effi-ciency.  ...  The converged R network is used to train the Q network, which is also based on the same data set, but uses a stochastic reinforcement learning algorithm to simulate the data distribution of the actual  ... 
arXiv:2202.00855v3 fatcat:iqyx2le56ramdkqqwy6pwnpeim

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems [article]

Sergey Levine, Aviral Kumar, George Tucker, Justin Fu
2020 arXiv   pre-print
In this tutorial article, we aim to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcement learning algorithms that utilize  ...  Effective offline reinforcement learning methods would be able to extract policies with the maximum possible utility out of the available data, thereby allowing automation of a wide range of decision-making  ...  As with Q-learning and actor-critic methods, model-based reinforcement learning algorithms can be applied to the offline setting naïvely.  ... 
arXiv:2005.01643v3 fatcat:kyw5xc4dijgz3dpuytnbcrmlam

A Survey on Reinforcement Learning for Recommender Systems [article]

Yuanguo Lin, Yong Liu, Fan Lin, Lixin Zou, Pengcheng Wu, Wenhua Zeng, Huanhuan Chen, Chunyan Miao
2022 arXiv   pre-print
In particular, Reinforcement Learning (RL) based recommender systems have become an emerging research topic in recent years.  ...  Empirical results show that RL-based recommendation methods often surpass most of supervised learning methods, owing to the interactive nature and autonomous learning ability.  ...  Based on this method, the authors develop a self-supervised Q-learning model to train two layers with the logged implicit feedback.  ... 
arXiv:2109.10665v2 fatcat:wx5ghn66hzg7faxee54jf7gspq

Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL [article]

Catherine Cang, Aravind Rajeswaran, Pieter Abbeel, Michael Laskin
2021 arXiv   pre-print
Offline Reinforcement Learning (RL) aims to extract near-optimal policies from imperfect offline data without additional environment interactions.  ...  To this end, we introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE).  ...  The authors thank Kevin Lu and Justin Fu for help with setting up the D4RL benchmark tasks.  ... 
arXiv:2106.09119v2 fatcat:clvqausk3fcnfdp7i6rpgzapde

Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [article]

Xi Chen, Ali Ghadirzadeh, Tianhe Yu, Yuan Gao, Jianhao Wang, Wenzhe Li, Bin Liang, Chelsea Finn, Chongjie Zhang
2022 arXiv   pre-print
reinforcement learning methods by 49% on heterogeneous datasets, and by 8% on datasets with narrow and biased distributions.  ...  Offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.  ...  Offline Reinforcement Learning The goal of reinforcement learning is to obtain a policy that maximizes a notion of accumulated reward for a task as a Markov decision process (MDP).  ... 
arXiv:2203.08949v1 fatcat:jwj5qo6xd5gmzh2fppiyjqn4bu
« Previous Showing results 1 — 15 out of 3,479 results