A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Offline Reinforcement Learning with Implicit Q-Learning
[article]
2021
arXiv
pre-print
We dub our method implicit Q-learning (IQL). IQL demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline reinforcement learning. ...
Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation ...
Due to the property discussed in Theorem 3 we dub our method implicit Q-learning (IQL). ...
arXiv:2110.06169v1
fatcat:alptan36azbpxme4qs5237r2lu
Human-centric Dialog Training via Offline Reinforcement Learning
[article]
2020
arXiv
pre-print
We start by hosting models online, and gather human feedback from real-time, open-ended conversations, which we then use to train and improve the models using offline reinforcement learning (RL). ...
We test the resulting dialog model with ratings from 80 users in an open-domain setting and find it achieves significant improvements over existing deep offline RL approaches. ...
Deep reinforcement learning with double q-
learning. In Thirtieth AAAI Conference on Artificial
Intelligence.
Harry Weger Jr, Gina R Castle, and Melissa C Emmett.
2010. ...
arXiv:2010.05848v1
fatcat:fxelzo2gubahrfjvk34jdwthfi
Offline Reinforcement Learning with Value-based Episodic Memory
[article]
2021
arXiv
pre-print
Further, we introduce implicit planning along offline trajectories to enhance learned V-values and accelerate convergence. ...
Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by effectively utilizing previously collected data. ...
Accelerating online reinforcement
learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020. ...
arXiv:2110.09796v1
fatcat:zl7dshvvbzalpauufgtnqvfxxu
DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization
[article]
2021
arXiv
pre-print
It is reasonable to surmise that deep reinforcement learning (RL) methods could also benefit from this effect. ...
In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate ...
Conservative q-learning for offline
reinforcement learning. arXiv preprint arXiv:2006.04779, 2020b.
Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, and Sergey Levine. ...
arXiv:2112.04716v1
fatcat:3jk67c5mpnathc263pzhyy47zi
Dual Behavior Regularized Reinforcement Learning
[article]
2021
arXiv
pre-print
Reinforcement learning has been shown to perform a range of complex tasks through interaction with an environment or collected leveraging experience. ...
Additional ablations provide insights into how our dual behavior regularized reinforcement learning approach is designed compared with other plausible modifications and demonstrates its ability to generalize ...
The advantage weighted actor-critic (AWAC) algorithm, trains an off-policy critic and an actor with an implicit policy constraint without the use of a behavior policy in the offline reinforcement learning ...
arXiv:2109.09037v1
fatcat:wou5srspaje4lc77ug5rrt3upi
IIDQN: An Incentive Improved DQN Algorithm in EBSN Recommender System
2022
Security and Communication Networks
To address these issues, an Incentive Improved DQN (IIDQN) based on Deep Q-Learning Networks (DQN) is proposed. ...
Event-based Social Networks (EBSN), combining online networks with offline users, provide versatile event recommendations for offline users through complex social networks. ...
with deep reinforcement learning in 2013 [2, 3] . ...
doi:10.1155/2022/7502248
fatcat:gsvifk3sdfe55jjghsu5l7ozsa
Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning
[article]
2021
arXiv
pre-print
Learning from datasets without interaction with environments (Offline Learning) is an essential step to apply Reinforcement Learning (RL) algorithms in real-world scenarios. ...
In this paper, we propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error by only trusting the state-action pairs given in ...
Algorithm 1 : 1 Implicit Constraint Q-Learning in Single-Agent Tasks. Input: Offline buffer B, target network update rate d. ...
arXiv:2106.03400v2
fatcat:fxxuhcneafe7beuiypglcu6o5e
IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data
[article]
2020
arXiv
pre-print
In this paper, we propose Implicit Reinforcement without Interaction at Scale (IRIS), a novel framework for learning from large-scale demonstration datasets. ...
For simple short-horizon manipulation tasks with modest variation in task instances, offline learning from a small set of demonstrations can produce controllers that successfully solve the task. ...
We thank Kevin Shih for help with training models on image observations. We thank members of the NVIDIA Seattle Robotics Lab for several helpful discussions and feedback. ...
arXiv:1911.05321v2
fatcat:q6kp2tjdrreu3oefw5dyczulj4
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
[article]
2021
arXiv
pre-print
Reinforcement learning (RL) provides an appealing formalism for learning control policies from experience. ...
offline data and actually continue to improve it further with online RL. ...
Effective data-driven methods for deep reinforcement learning should be able to use this data to pre-train offline while improving with online fine-tuning. ...
arXiv:2006.09359v6
fatcat:yvtbbrzrmvfrpeibmwdu2h3j5q
An Optimistic Perspective on Offline Reinforcement Learning
[article]
2020
arXiv
pre-print
Off-policy reinforcement learning (RL) using a fixed offline dataset of logged interactions is an important consideration in real world applications. ...
To enhance generalization in the offline setting, we present Random Ensemble Mixture (REM), a robust Q-learning algorithm that enforces optimal Bellman consistency on random convex combinations of multiple ...
Acknowledgements We thank Pablo Samuel Castro for help in understanding and debugging issues with the Dopamine codebase and reviewing an early draft of the paper. ...
arXiv:1907.04543v4
fatcat:ocqec67o7zhvtlsz4sjlwvoa7e
Extension: Adaptive Sampling with Implicit Radiance Field
[article]
2022
arXiv
pre-print
This manuscript discusses the extension of adaptive light field sampling with implicit radiance fields. ...
The Q network is trained based on deep reinforcement learning strategies to guide the sampling process of obtaining sparse samples to improve sampling. s effi-ciency. ...
The converged R network is used to train the Q network, which is also based on the same data set, but uses a stochastic reinforcement learning algorithm to simulate the data distribution of the actual ...
arXiv:2202.00855v3
fatcat:iqyx2le56ramdkqqwy6pwnpeim
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
[article]
2020
arXiv
pre-print
In this tutorial article, we aim to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcement learning algorithms that utilize ...
Effective offline reinforcement learning methods would be able to extract policies with the maximum possible utility out of the available data, thereby allowing automation of a wide range of decision-making ...
As with Q-learning and actor-critic methods, model-based reinforcement learning algorithms can be applied to the offline setting naïvely. ...
arXiv:2005.01643v3
fatcat:kyw5xc4dijgz3dpuytnbcrmlam
A Survey on Reinforcement Learning for Recommender Systems
[article]
2022
arXiv
pre-print
In particular, Reinforcement Learning (RL) based recommender systems have become an emerging research topic in recent years. ...
Empirical results show that RL-based recommendation methods often surpass most of supervised learning methods, owing to the interactive nature and autonomous learning ability. ...
Based on this method, the authors develop a self-supervised Q-learning model to train two layers with the logged implicit feedback. ...
arXiv:2109.10665v2
fatcat:wx5ghn66hzg7faxee54jf7gspq
Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL
[article]
2021
arXiv
pre-print
Offline Reinforcement Learning (RL) aims to extract near-optimal policies from imperfect offline data without additional environment interactions. ...
To this end, we introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE). ...
The authors thank Kevin Lu and Justin Fu for help with setting up the D4RL benchmark tasks. ...
arXiv:2106.09119v2
fatcat:clvqausk3fcnfdp7i6rpgzapde
Latent-Variable Advantage-Weighted Policy Optimization for Offline RL
[article]
2022
arXiv
pre-print
reinforcement learning methods by 49% on heterogeneous datasets, and by 8% on datasets with narrow and biased distributions. ...
Offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions. ...
Offline Reinforcement Learning The goal of reinforcement learning is to obtain a policy that maximizes a notion of accumulated reward for a task as a Markov decision process (MDP). ...
arXiv:2203.08949v1
fatcat:jwj5qo6xd5gmzh2fppiyjqn4bu
« Previous
Showing results 1 — 15 out of 3,479 results