86 Hits in 3.5 sec

COMBO: Conservative Offline Model-Based Policy Optimization [article]

Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn
2022 arXiv   pre-print
Through experiments, we find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods on widely studied offline RL benchmarks, including image-based  ...  We overcome this limitation by developing a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-action tuples generated via rollouts under the learned  ...  Our main contribution is the development of conservative offline model-based policy optimization (COMBO), a new model-based algorithm for offline RL.  ... 
arXiv:2102.08363v2 fatcat:azvca4wb65gc5aypjpidqphgzi

Model-Based Offline Meta-Reinforcement Learning with Regularization [article]

Sen Lin, Jialin Wan, Tengyu Xu, Yingbin Liang, Junshan Zhang
2022 arXiv   pre-print
In particular, we devise a new meta-Regularized model-based Actor-Critic (RAC) method for within-task policy optimization, as a key building block of MerPO, using conservative policy evaluation and regularized  ...  Motivated by such empirical analysis, we explore model-based offline Meta-RL with regularized Policy Optimization (MerPO), which learns a meta-model for efficient task structure inference and an informative  ...  The main objective is to learn a meta-policy based on a set of offline training tasks {M n } N n=1 . Conservative Offline Model-Based Policy Optimization (COMBO).  ... 
arXiv:2202.02929v1 fatcat:mfj35r2zazbnzpklwkvy2dw44u

Offline Reinforcement Learning with Reverse Model-based Imagination [article]

Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang
2021 arXiv   pre-print
To encourage more conservatism, we propose a novel model-based offline RL framework, called Reverse Offline Model-based Imagination (ROMI).  ...  These reverse imaginations provide informed data augmentation for model-free policy learning and enable conservative generalization beyond the offline dataset.  ...  It is agnostic to policy optimization and thus can be regarded as an effective and flexible plug-in component to induce conservative model-based imaginations for offline RL.  ... 
arXiv:2110.00188v2 fatcat:n5k3r3kwknglxg6w2lz3tcyhae

DROMO: Distributionally Robust Offline Model-based Policy Optimization [article]

Ruizhen Liu, Dazhi Zhong, Zhicong Chen
2021 arXiv   pre-print
To extend the basic idea of regularization without uncertainty quantification, we propose distributionally robust offline model-based policy optimization (DROMO), which leverages the ideas in distributionally  ...  We consider the problem of offline reinforcement learning with model-based control, whose goal is to learn a dynamics model from the experience replay and obtain a pessimism-oriented agent under the learned  ...  COMBO: Conservative Offline Model-Based Policy Optimization, b. URL 2102.08363. Corollary C.3 (Restatement of Corollary 4.3).  ... 
arXiv:2109.07275v1 fatcat:2c4bhmx2x5c4rj4bqqpu2bvfae

Offline Inverse Reinforcement Learning [article]

Firas Jarboui, Vianney Perchet
2021 arXiv   pre-print
The objective of offline RL is to learn optimal policies when a fixed exploratory demonstrations data-set is available and sampling additional observations is impossible (typically if this operation is  ...  The objective is then to learn an optimal policy w.r.t. the expert's latent cost function.  ...  function [14, 28] or the value updates [18, 27] in unobserved state action pairs) Conservative Offline Model-Based Policy Optimisation (COMBO) [27] falls in the later category.  ... 
arXiv:2106.05068v1 fatcat:xt4rnb6zmze4jca42j2a6ilupy

BATS: Best Action Trajectory Stitching [article]

Ian Char, Viraj Mehta, Adam Villaflor, John M. Dolan, Jeff Schneider
2022 arXiv   pre-print
the optimal policy of the MDP created by our algorithm avoids this problem.  ...  The problem of offline reinforcement learning focuses on learning a good policy from a log of environment interactions.  ...  Acknowledgments and Disclosure of Funding This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE1745016.  ... 
arXiv:2204.12026v1 fatcat:ilvqmo7sxrho5aqa5rr2txybmy

DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning [article]

Jinxin Liu, Hongyin Zhang, Donglin Wang
2022 arXiv   pre-print
the dynamics shift problem in which prior offline methods do not scale well, and 3) derive a simple dynamics-aware reward augmentation (DARA) framework from both model-free and model-based offline settings  ...  The experimental evaluation demonstrates that DARA, by augmenting rewards in the source offline dataset, can acquire an adaptive policy for the target environment and yet significantly reduce the requirement  ...  MOPO: model-based offline policy optimization.  ... 
arXiv:2203.06662v1 fatcat:hwqga3uo4jhcdinmu7qh2jkyeu

Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL [article]

Catherine Cang, Aravind Rajeswaran, Pieter Abbeel, Michael Laskin
2021 arXiv   pre-print
Offline Reinforcement Learning (RL) aims to extract near-optimal policies from imperfect offline data without additional environment interactions.  ...  To this end, we introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE).  ...  CQL: Conservative Q-Learning (CQL) [17] is a leading offline model-free baselines.  ... 
arXiv:2106.09119v2 fatcat:clvqausk3fcnfdp7i6rpgzapde

How to Leverage Unlabeled Data in Offline Reinforcement Learning [article]

Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Chelsea Finn, Sergey Levine
2022 arXiv   pre-print
Offline reinforcement learning (RL) can learn control policies from static datasets but, like standard RL methods, it requires reward annotations for every transition.  ...  we find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing both in theory and in practice, without learning any reward model  ...  offline RL method COMBO(Yu et al., 2021b)that trains a dynamics model on all of the data and performs model-based offline training using the learned model.  ... 
arXiv:2202.01741v2 fatcat:qdt63dnmtneivdlvzhni64l7lq

RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning [article]

Marc Rigter, Bruno Lacerda, Nick Hawes
2022 arXiv   pre-print
In this work, we present Robust Adversarial Model-Based Offline RL (RAMBO), a novel approach to model-based offline RL.  ...  Model-based algorithms, which learn a model of the environment from the dataset and perform conservative policy optimisation within that model, have emerged as a promising approach to this problem.  ...  Model-Based Offline RL Algorithms Model-based approaches to offline RL use a model of the MDP to help train the policy.  ... 
arXiv:2204.12581v1 fatcat:gfim2ktl2rds5bsx5zy5cg7liu

A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems [article]

Rafael Figueiredo Prudencio, Marcos R. O. A. Maximo, Esther Luna Colombini
2022 arXiv   pre-print
Offline RL is a paradigm that learns exclusively from static datasets of previously collected interactions, making it feasible to extract policies from large and diverse training datasets.  ...  In this work, we propose a unifying taxonomy to classify offline RL methods.  ...  [18] proposed a method dubbed Conservative Offline Model-Based policy Optimization (COMBO), which is a model-based version of CQL [17] .  ... 
arXiv:2203.01387v2 fatcat:euobvze7kre3fi7blalnbbgefm

Revisiting Design Choices in Offline Model-Based Reinforcement Learning [article]

Cong Lu, Philip J. Ball, Jack Parker-Holder, Michael A. Osborne, Stephen J. Roberts
2022 arXiv   pre-print
Significant progress has been made recently in offline model-based reinforcement learning, approaches which leverage a learned dynamics model.  ...  implemented through a penalty based on estimated model uncertainty.  ...  MBRL works by training a dynamics model from the offline data, then optimizing a policy using imaginary rollouts from the model.  ... 
arXiv:2110.04135v2 fatcat:7q3dbl4p6natxkhyriedg3lbdu

Offline Reinforcement Learning as Anti-Exploration [article]

Shideh Rezaeifar, Robert Dadashi, Nino Vieillard, Léonard Hussenot, Olivier Bachem, Olivier Pietquin, Matthieu Geist
2021 arXiv   pre-print
Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system.  ...  We thus take inspiration from the literature on bonus-based exploration to design a new offline RL agent.  ...  Combo: Conservative offline model-based policy optimization. arXiv preprint arXiv:2102.08363, 2021. [58] C. Zhou and R. C. Paffenroth. Anomaly detection with robust deep autoencoders.  ... 
arXiv:2106.06431v1 fatcat:crppp6covnc7plgf6uttkwyili

Adversarially Trained Actor Critic for Offline Reinforcement Learning [article]

Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal
2022 arXiv   pre-print
We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning under insufficient data coverage, based on a two-player Stackelberg game framing of offline  ...  RL: A policy actor competes against an adversarially trained value critic, who finds data-consistent scenarios where the actor is inferior to the data-collection behavior policy.  ...  Table 1 . 1 Table 1 show that ATAC and ATAC * outperform other model-free offline RL baselines consistently and model-based method COMBO mostly.  ... 
arXiv:2202.02446v1 fatcat:j3xkqoiotbeoflelq35ngyttty

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism [article]

Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell
2021 arXiv   pre-print
Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed dataset without active data collection.  ...  Based on the composition of the offline dataset, two main categories of methods are used: imitation learning which is suitable for expert datasets and vanilla offline RL which often requires uniform coverage  ...  COMBO: Conservative offline model-based policy optimization. arXiv preprint arXiv:2102.08363, 2021. Ekim Yurtsever, Jacob Lambert, Alexander Carballo, and Kazuya Takeda.  ... 
arXiv:2103.12021v1 fatcat:7wbhgdjr65gx7lme7gmf35txum
« Previous Showing results 1 — 15 out of 86 results