A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Benchmarks for Deep Off-Policy Evaluation
[article]
2021
arXiv
pre-print
Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making. ...
In order to address this gap, we present a collection of policies that in conjunction with existing offline datasets can be used for benchmarking off-policy evaluation. ...
DOPE: DEEP OFF-POLICY EVALUATION The goal of the Deep Off-Policy Evaluation (DOPE) benchmark is to provide tasks that are challenging and effective measures of progress for OPE methods, yet is easy to ...
arXiv:2103.16596v1
fatcat:cyedqya5irf4rejzierrvxdy2y
Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods
[article]
2018
arXiv
pre-print
To answer this question, we propose a simulated benchmark for robotic grasping that emphasizes off-policy learning and generalization to unseen objects. ...
We evaluate the benchmark tasks against a variety of Q-function estimation methods, a method previously proposed for robotic grasping with deep neural network models, and a novel approach based on a combination ...
ACKNOWLEDGEMENTS We thank Laura Downs, Erwin Coumans, Ethan Holly, John-Michael Burke, and Peter Pastor for helping with experiments. ...
arXiv:1802.10264v2
fatcat:apk5d3vs5ne4zd7xhzcldhzd4e
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
[article]
2021
arXiv
pre-print
To facilitate research, we have released our benchmark tasks and datasets with a comprehensive evaluation of existing algorithms, an evaluation protocol, and open-source examples. ...
In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL. ...
Acknowledgements We would like to thank Abhishek Gupta, Aravind Rajeswaran, Eugene Vinitsky, and Rowan McAllister for providing implementations and assistance in setting up tasks, Michael Janner for informative ...
arXiv:2004.07219v4
fatcat:fkjwmgpxjzbyhmwxffdlu62dz4
CEM-RL: Combining evolutionary and gradient-based methods for policy search
[article]
2019
arXiv
pre-print
We evaluate the resulting method, cem-rl, on a set of benchmarks classically used in deep RL. ...
off-policy deep RL algorithm. ...
In particular, off-policy deep RL algorithms can use a replay buffer to exploit the same samples as many times as useful, greatly improving sample efficiency. ...
arXiv:1810.01222v3
fatcat:e7saewhrc5f3vj4vw2jsxjiyvu
Trajectory-Based Off-Policy Deep Reinforcement Learning
[article]
2019
arXiv
pre-print
We evaluate the proposed approach on a series of continuous control benchmark tasks. ...
This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. ...
Trajectory based objective estimate Whilst evaluation of the Monte Carlo based expected cost estimate is possible also for deterministic policies, the off-policy evaluation is no longer feasible since ...
arXiv:1905.05710v1
fatcat:6po2azo7yndsrjmh4ewcdnfmum
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
[article]
2018
arXiv
pre-print
on-policy and off-policy methods. ...
In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. ...
Acknowledgments We would like to thank Vitchyr Pong for insightful discussions and help in implementing our algorithm as well as providing the DDPG baseline code; Ofir Nachum for offering support in running ...
arXiv:1801.01290v2
fatcat:5737bv4lmzdzxbv6xreow6phfy
A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems
[article]
2022
arXiv
pre-print
Furthermore, we provide a comprehensive review of the latest algorithmic breakthroughs in the field, and a review of existing benchmarks' properties and shortcomings. ...
Finally, we provide our perspective on open problems and propose future research directions for this rapidly growing field. ...
Here, we list the three evaluation metrics from the DOPE benchmark that allow one to perform off-policy evaluation and selection. a) Absolute Error: This metric is intended for off-policy evaluation instead ...
arXiv:2203.01387v2
fatcat:euobvze7kre3fi7blalnbbgefm
Measuring and Characterizing Generalization in Deep Reinforcement Learning
[article]
2018
arXiv
pre-print
We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on-policy states, even though ...
We propose a set of practical methods for evaluating agents with these definitions of generalization. ...
Acknowledgements Thanks to Kaleigh Clary, John Foley, and the anonymous AAAI reviewers for thoughtful comments and contributions. ...
arXiv:1812.02868v2
fatcat:fcfqklz5wfadfe5hhywnynn5ai
Exploring the Limitations of Behavior Cloning for Autonomous Driving
2019
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
The code, dataset, benchmark, and agent studied in this paper can be found at ...
We show that Behavior Cloning yields state-of-the-art policies in these complex scenarios and investigate its limitations. ...
This suggests the order of training samples matters for off-policy Imitation Learning, similar to the on-policy case [46] . Our paper is organized as follows. ...
doi:10.1109/iccv.2019.00942
dblp:conf/iccv/CodevillaSLG19
fatcat:sbmedxvpwnhjtk6oajprvnw6mi
ChainerRL: A Deep Reinforcement Learning Library
[article]
2021
arXiv
pre-print
To foster reproducible research, and for instructional purposes, ChainerRL provides scripts that closely replicate the original papers' experimental settings and reproduce published benchmark results for ...
In this paper, we introduce ChainerRL, an open-source deep reinforcement learning (DRL) library built using Python and the Chainer deep learning framework. ...
Miranda, and all the open source contributors for their contributions to the development of ChainerRL. We thank Kohei Hayashi and Jason Naradowsky for useful comments on how to improve the paper. ...
arXiv:1912.03905v2
fatcat:awe4liu7qfaevesw2dqeutxcoy
RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning
[article]
2021
arXiv
pre-print
In this paper, we propose a benchmark called RL Unplugged to evaluate and compare offline RL methods. ...
We propose detailed evaluation protocols for each domain in RL Unplugged and provide an extensive analysis of supervised learning and offline RL methods using these protocols. ...
We then picked every fifth game for our offline policy selection task to cover diverse set of games in terms of difficulty. ...
arXiv:2006.13888v4
fatcat:whpwfgfbkveyfplq2smz5ovmo4
Q-Learning for Continuous Actions with Cross-Entropy Guided Policies
[article]
2019
arXiv
pre-print
Off-Policy reinforcement learning (RL) is an important class of methods for many problem domains, such as robotics, where the cost of collecting data is high and on-policy methods are consequently intractable ...
standard benchmarks. ...
Off-policy Q-learning methods have been proposed as a more data efficient alternative, typified by Deep Deterministic Policy Gradients (DDPG) (Lillicrap et al., 2015) . ...
arXiv:1903.10605v3
fatcat:bbinftbnsnegjotnmwzd3ejrze
CrossNorm: Normalization for Off-Policy TD Reinforcement Learning
[article]
2019
arXiv
pre-print
It can be regarded as an extension of batch normalization that re-centers data for two different distributions, as present in off-policy learning. ...
Intriguingly, deep off-policy TD algorithms are not commonly used in combination with feature normalization techniques, despite positive effects of normalization in other domains. ...
Conclusion We identified that normalization based on a mixture of on-and off-policy transitions is an effective strategy to mitigate divergence and to improve returns in deep off-policy TD learning. ...
arXiv:1902.05605v2
fatcat:4o7f2sbsejabncxatb7hjghkeq
Verified Probabilistic Policies for Deep Reinforcement Learning
[article]
2022
arXiv
pre-print
In this paper, we tackle the problem of verifying probabilistic policies for deep reinforcement learning, which are used to, for example, tackle adversarial environments, break symmetries and manage trade-offs ...
Deep reinforcement learning is an increasingly popular technique for synthesising policies to control an agent's interaction with its environment. ...
We show that our approach successfully verifies probabilistic policies trained for several reinforcement learning benchmarks and explore trade-offs in precision and computational efficiency. ...
arXiv:2201.03698v1
fatcat:6q6tle2d45aphn6gicqci7h5f4
Deep Reinforcement Learning for Visual Object Tracking in Videos
[article]
2017
arXiv
pre-print
The proposed tracking algorithm achieves state-of-the-art performance in an existing tracking benchmark and operates at frame-rates faster than real-time. ...
formulate our model as a recurrent convolutional neural network agent that interacts with a video overtime, and our model can be trained with reinforcement learning (RL) algorithms to learn good tracking policies ...
The deep RL algorithm directly optimizes a long-term tracking performance measure which depends on the whole tracking video sequence. ...
arXiv:1701.08936v2
fatcat:csvjdoftvffrrnrsvtvpkpcq6u
« Previous
Showing results 1 — 15 out of 43,608 results