43,608 Hits in 3.4 sec

Benchmarks for Deep Off-Policy Evaluation [article]

Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine (+1 others)
2021 arXiv   pre-print
Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making.  ...  In order to address this gap, we present a collection of policies that in conjunction with existing offline datasets can be used for benchmarking off-policy evaluation.  ...  DOPE: DEEP OFF-POLICY EVALUATION The goal of the Deep Off-Policy Evaluation (DOPE) benchmark is to provide tasks that are challenging and effective measures of progress for OPE methods, yet is easy to  ... 
arXiv:2103.16596v1 fatcat:cyedqya5irf4rejzierrvxdy2y

Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods [article]

Deirdre Quillen, Eric Jang, Ofir Nachum, Chelsea Finn, Julian Ibarz, Sergey Levine
2018 arXiv   pre-print
To answer this question, we propose a simulated benchmark for robotic grasping that emphasizes off-policy learning and generalization to unseen objects.  ...  We evaluate the benchmark tasks against a variety of Q-function estimation methods, a method previously proposed for robotic grasping with deep neural network models, and a novel approach based on a combination  ...  ACKNOWLEDGEMENTS We thank Laura Downs, Erwin Coumans, Ethan Holly, John-Michael Burke, and Peter Pastor for helping with experiments.  ... 
arXiv:1802.10264v2 fatcat:apk5d3vs5ne4zd7xhzcldhzd4e

D4RL: Datasets for Deep Data-Driven Reinforcement Learning [article]

Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine
2021 arXiv   pre-print
To facilitate research, we have released our benchmark tasks and datasets with a comprehensive evaluation of existing algorithms, an evaluation protocol, and open-source examples.  ...  In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.  ...  Acknowledgements We would like to thank Abhishek Gupta, Aravind Rajeswaran, Eugene Vinitsky, and Rowan McAllister for providing implementations and assistance in setting up tasks, Michael Janner for informative  ... 
arXiv:2004.07219v4 fatcat:fkjwmgpxjzbyhmwxffdlu62dz4

CEM-RL: Combining evolutionary and gradient-based methods for policy search [article]

Aloïs Pourchot, Olivier Sigaud
2019 arXiv   pre-print
We evaluate the resulting method, cem-rl, on a set of benchmarks classically used in deep RL.  ...  off-policy deep RL algorithm.  ...  In particular, off-policy deep RL algorithms can use a replay buffer to exploit the same samples as many times as useful, greatly improving sample efficiency.  ... 
arXiv:1810.01222v3 fatcat:e7saewhrc5f3vj4vw2jsxjiyvu

Trajectory-Based Off-Policy Deep Reinforcement Learning [article]

Andreas Doerr, Michael Volpp, Marc Toussaint, Sebastian Trimpe, Christian Daniel
2019 arXiv   pre-print
We evaluate the proposed approach on a series of continuous control benchmark tasks.  ...  This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies.  ...  Trajectory based objective estimate Whilst evaluation of the Monte Carlo based expected cost estimate is possible also for deterministic policies, the off-policy evaluation is no longer feasible since  ... 
arXiv:1905.05710v1 fatcat:6po2azo7yndsrjmh4ewcdnfmum

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor [article]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine
2018 arXiv   pre-print
on-policy and off-policy methods.  ...  In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework.  ...  Acknowledgments We would like to thank Vitchyr Pong for insightful discussions and help in implementing our algorithm as well as providing the DDPG baseline code; Ofir Nachum for offering support in running  ... 
arXiv:1801.01290v2 fatcat:5737bv4lmzdzxbv6xreow6phfy

A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems [article]

Rafael Figueiredo Prudencio, Marcos R. O. A. Maximo, Esther Luna Colombini
2022 arXiv   pre-print
Furthermore, we provide a comprehensive review of the latest algorithmic breakthroughs in the field, and a review of existing benchmarks' properties and shortcomings.  ...  Finally, we provide our perspective on open problems and propose future research directions for this rapidly growing field.  ...  Here, we list the three evaluation metrics from the DOPE benchmark that allow one to perform off-policy evaluation and selection. a) Absolute Error: This metric is intended for off-policy evaluation instead  ... 
arXiv:2203.01387v2 fatcat:euobvze7kre3fi7blalnbbgefm

Measuring and Characterizing Generalization in Deep Reinforcement Learning [article]

Sam Witty, Jun Ki Lee, Emma Tosch, Akanksha Atrey, Michael Littman, David Jensen
2018 arXiv   pre-print
We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on-policy states, even though  ...  We propose a set of practical methods for evaluating agents with these definitions of generalization.  ...  Acknowledgements Thanks to Kaleigh Clary, John Foley, and the anonymous AAAI reviewers for thoughtful comments and contributions.  ... 
arXiv:1812.02868v2 fatcat:fcfqklz5wfadfe5hhywnynn5ai

Exploring the Limitations of Behavior Cloning for Autonomous Driving

Felipe Codevilla, Eder Santana, Antonio Lopez, Adrien Gaidon
2019 2019 IEEE/CVF International Conference on Computer Vision (ICCV)  
The code, dataset, benchmark, and agent studied in this paper can be found at  ...  We show that Behavior Cloning yields state-of-the-art policies in these complex scenarios and investigate its limitations.  ...  This suggests the order of training samples matters for off-policy Imitation Learning, similar to the on-policy case [46] . Our paper is organized as follows.  ... 
doi:10.1109/iccv.2019.00942 dblp:conf/iccv/CodevillaSLG19 fatcat:sbmedxvpwnhjtk6oajprvnw6mi

ChainerRL: A Deep Reinforcement Learning Library [article]

Yasuhiro Fujita, Prabhat Nagarajan, Toshiki Kataoka, Takahiro Ishikawa
2021 arXiv   pre-print
To foster reproducible research, and for instructional purposes, ChainerRL provides scripts that closely replicate the original papers' experimental settings and reproduce published benchmark results for  ...  In this paper, we introduce ChainerRL, an open-source deep reinforcement learning (DRL) library built using Python and the Chainer deep learning framework.  ...  Miranda, and all the open source contributors for their contributions to the development of ChainerRL. We thank Kohei Hayashi and Jason Naradowsky for useful comments on how to improve the paper.  ... 
arXiv:1912.03905v2 fatcat:awe4liu7qfaevesw2dqeutxcoy

RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [article]

Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li (+6 others)
2021 arXiv   pre-print
In this paper, we propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.  ...  We propose detailed evaluation protocols for each domain in RL Unplugged and provide an extensive analysis of supervised learning and offline RL methods using these protocols.  ...  We then picked every fifth game for our offline policy selection task to cover diverse set of games in terms of difficulty.  ... 
arXiv:2006.13888v4 fatcat:whpwfgfbkveyfplq2smz5ovmo4

Q-Learning for Continuous Actions with Cross-Entropy Guided Policies [article]

Riley Simmons-Edler, Ben Eisner, Eric Mitchell, Sebastian Seung, Daniel Lee
2019 arXiv   pre-print
Off-Policy reinforcement learning (RL) is an important class of methods for many problem domains, such as robotics, where the cost of collecting data is high and on-policy methods are consequently intractable  ...  standard benchmarks.  ...  Off-policy Q-learning methods have been proposed as a more data efficient alternative, typified by Deep Deterministic Policy Gradients (DDPG) (Lillicrap et al., 2015) .  ... 
arXiv:1903.10605v3 fatcat:bbinftbnsnegjotnmwzd3ejrze

CrossNorm: Normalization for Off-Policy TD Reinforcement Learning [article]

Aditya Bhatt, Max Argus, Artemij Amiranashvili, Thomas Brox
2019 arXiv   pre-print
It can be regarded as an extension of batch normalization that re-centers data for two different distributions, as present in off-policy learning.  ...  Intriguingly, deep off-policy TD algorithms are not commonly used in combination with feature normalization techniques, despite positive effects of normalization in other domains.  ...  Conclusion We identified that normalization based on a mixture of on-and off-policy transitions is an effective strategy to mitigate divergence and to improve returns in deep off-policy TD learning.  ... 
arXiv:1902.05605v2 fatcat:4o7f2sbsejabncxatb7hjghkeq

Verified Probabilistic Policies for Deep Reinforcement Learning [article]

Edoardo Bacci, David Parker
2022 arXiv   pre-print
In this paper, we tackle the problem of verifying probabilistic policies for deep reinforcement learning, which are used to, for example, tackle adversarial environments, break symmetries and manage trade-offs  ...  Deep reinforcement learning is an increasingly popular technique for synthesising policies to control an agent's interaction with its environment.  ...  We show that our approach successfully verifies probabilistic policies trained for several reinforcement learning benchmarks and explore trade-offs in precision and computational efficiency.  ... 
arXiv:2201.03698v1 fatcat:6q6tle2d45aphn6gicqci7h5f4

Deep Reinforcement Learning for Visual Object Tracking in Videos [article]

Da Zhang, Hamid Maei, Xin Wang, Yuan-Fang Wang
2017 arXiv   pre-print
The proposed tracking algorithm achieves state-of-the-art performance in an existing tracking benchmark and operates at frame-rates faster than real-time.  ...  formulate our model as a recurrent convolutional neural network agent that interacts with a video overtime, and our model can be trained with reinforcement learning (RL) algorithms to learn good tracking policies  ...  The deep RL algorithm directly optimizes a long-term tracking performance measure which depends on the whole tracking video sequence.  ... 
arXiv:1701.08936v2 fatcat:csvjdoftvffrrnrsvtvpkpcq6u
« Previous Showing results 1 — 15 out of 43,608 results