Filters








2,135 Hits in 2.3 sec

Wasserstein Unsupervised Reinforcement Learning [article]

Shuncheng He, Yuhang Jiang, Hongchang Zhang, Jianzhun Shao, Xiangyang Ji
2021 arXiv   pre-print
Therefore we propose a new framework Wasserstein unsupervised reinforcement learning (WURL) where we directly maximize the distance of state distributions induced by different policies.  ...  Unsupervised reinforcement learning aims to train agents to learn a handful of policies or skills in environments without external reward.  ...  Nonetheless, the primal form estimation methods show competitive accuracy empirically. 3 Wasserstein unsupervised reinforcement learning 3.1 MI-based unsupervised reinforcement learning Traditional unsupervised  ... 
arXiv:2110.07940v1 fatcat:m5uljtzqwndeldfytdmmms4uaa

Wasserstein Robust Reinforcement Learning [article]

Mohammed Amin Abdullah and Hang Ren and Haitham Bou Ammar and Vladimir Milenkovic and Rui Luo and Mingtian Zhang and Jun Wang
2019 arXiv   pre-print
Our method formalises robust reinforcement learning as a novel min-max game with a Wasserstein constraint for a correct and convergent solver.  ...  Reinforcement learning algorithms, though successful, tend to over-fit to training environments hampering their application to the real-world.  ...  Wasserstein Robust Reinforcement Learning This section formalises robust reinforcement learning by equipping agents with capabilities of determining well-behaved policies under worst-case models which  ... 
arXiv:1907.13196v4 fatcat:qeldcwoy6jbh7jnauw62acmxqu

Efficient Wasserstein Natural Gradients for Reinforcement Learning [article]

Ted Moskovitz, Michael Arbel, Ferenc Huszar, Arthur Gretton
2021 arXiv   pre-print
A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL).  ...  The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization.  ...  INTRODUCTION Defining efficient optimization algorithms for reinforcement learning (RL) that are able to leverage a meaningful measure of similarity between policies is a longstanding and challenging problem  ... 
arXiv:2010.05380v4 fatcat:7bhuob2ngjfdbfzmn4q44gyxra

Robust Reinforcement Learning with Wasserstein Constraint [article]

Linfang Hou, Liang Pang, Xin Hong, Yanyan Lan, Zhiming Ma, Dawei Yin
2020 arXiv   pre-print
Robust Reinforcement Learning aims to find the optimal policy with some extent of robustness to environmental dynamics.  ...  algorithm--Wasserstein Robust Advantage Actor-Critic algorithm (WRAAC).  ...  In Section 3, we mainly describe the framework of Wasserstein robust Reinforcement Learning.  ... 
arXiv:2006.00945v1 fatcat:zsxnk3qjvzfihdkan4i2ks6rc4

Reinforcement Learning with Wasserstein Distance Regularisation, with Applications to Multipolicy Learning [article]

Mohammed Amin Abdullah, Aldo Pacchiano, Moez Draief
2019 arXiv   pre-print
We describe an application of Wasserstein distance to Reinforcement Learning.  ...  This can be used to learn multiple polices which are different in terms of such Wasserstein distances by using a Wasserstein regulariser.  ...  Figure 3 : 3 Learning two policies by repulsive Wasserstein regularisation.  ... 
arXiv:1802.03976v2 fatcat:w2uwyqqy5rb3tkk6srpo5vdqta

Visual Transfer for Reinforcement Learning via Wasserstein Domain Confusion [article]

Josh Roy, George Konidaris
2020 arXiv   pre-print
We introduce Wasserstein Adversarial Proximal Policy Optimization (WAPPO), a novel algorithm for visual transfer in Reinforcement Learning that explicitly learns to align the distributions of extracted  ...  WAPPO approximates and minimizes the Wasserstein-1 distance between the distributions of features from source and target domains via a novel Wasserstein Confusion objective.  ...  Transfer in Reinforcement Learning Transfer in Reinforcement Learning has been a topic of interest far before the recent advent of Deep Neural Networks and Deep Reinforcement Learning.  ... 
arXiv:2006.03465v1 fatcat:yfobpv4tdrhp5njilmgygf5qnu

On Wasserstein Reinforcement Learning and the Fokker-Planck equation [article]

Pierre H. Richemond, Brendan Maginnis
2017 arXiv   pre-print
We derive policy gradients where the change in policy is limited to a small Wasserstein distance (or trust region).  ...  We show that in the small steps limit with respect to the Wasserstein distance W_2, policy dynamics are governed by the Fokker-Planck (heat) equation, following the Jordan-Kinderlehrer-Otto result.  ...  It is our hope that a Wasserstein loss, by implying relevant semantic directions in action space, will speed up convergence and training of reinforcement learning agents.  ... 
arXiv:1712.07185v1 fatcat:pfq7g32ffngdfahp376ytrqfk4

Equivalence Between Wasserstein and Value-Aware Loss for Model-based Reinforcement Learning [article]

Kavosh Asadi, Evan Cater, Dipendra Misra, Michael L. Littman
2018 arXiv   pre-print
Learning a generative model is a key component of model-based reinforcement learning.  ...  This equivalence improves our understanding of value-aware models, and also creates a theoretical foundation for applications of Wasserstein in model-based reinforcement~learning.  ...  distributions in reinforcement learning (Bellemare et al., 2017) .  ... 
arXiv:1806.01265v2 fatcat:7il5xbtadvel5i3kdks5c5xsse

Adversarial Intrinsic Motivation for Reinforcement Learning [article]

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone
2021 arXiv   pre-print
Specifically, this paper focuses on goal-conditioned reinforcement learning where the idealized (unachievable) target distribution has full measure at the goal.  ...  In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement  ...  Wasserstein-1 Distance for Goal-Conditioned Reinforcement Learning In this section we consider the problem of goal-conditioned reinforcement learning.  ... 
arXiv:2105.13345v3 fatcat:h4l5prweojdiddwxh2bo3ypyna

Wasserstein Distance Maximizing Intrinsic Control [article]

Ishan Durugkar, Steven Hansen, Stephen Spencer, Volodymyr Mnih
2021 arXiv   pre-print
This paper presents an approach that rewards the agent for learning skills that maximize the Wasserstein distance of their state visitation from the start state of the skill.  ...  This paper deals with the problem of learning a skill-conditioned policy that acts meaningfully in the absence of a reward signal.  ...  the structure of the reinforcement learning problem.  ... 
arXiv:2110.15331v1 fatcat:fuicl26mv5a5fbxcblzexu2mim

Distributional Reinforcement Learning with Quantile Regression [article]

Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos
2017 arXiv   pre-print
Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function.  ...  In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward.  ...  learning over the Wasserstein metric.  ... 
arXiv:1710.10044v1 fatcat:46adgv3uwjbmnkoe6ccavradyu

Reinforced Wasserstein Training for Severity-Aware Semantic Segmentation in Autonomous Driving [article]

Xiaofeng Liu, Yimeng Zhang, Xiongchang Liu, Song Bai, Site Li, Jane You
2020 arXiv   pre-print
Furthermore, an adaptively learning scheme of the ground matrix is proposed to utilize the high-fidelity CARLA simulator. Specifically, we follow a reinforcement alternative learning scheme.  ...  The ground metric of Wasserstein distance can be pre-defined following the experience on a specific task.  ...  loss (w/) in our reinforcement learning framework.  ... 
arXiv:2008.04751v1 fatcat:qr75d5lmjjakdhb5zsgcealfpy

CWAE-IRL: Formulating a supervised approach to Inverse Reinforcement Learning problem [article]

Arpan Kusari
2019 arXiv   pre-print
Inverse reinforcement learning (IRL) is used to infer the reward function from the actions of an expert running a Markov Decision Process (MDP).  ...  The reward function is derived using a well-known deep generative model known as Conditional Variational Auto-encoder (CVAE) with Wasserstein loss function, thus referred to as Conditional Wasserstein  ...  Conditional Wasserstein Auto-Encoder-IRL In this paper, my primary argument is that the inverse reinforcement learning problem can be devised as a supervised learning problem with learning of latent variable  ... 
arXiv:1910.00584v1 fatcat:yqw3exohanfpjpfofmglaxulhy

Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration [article]

Ming Zhang, Yawei Wang, Xiaoteng Ma, Li Xia, Jun Yang, Zhiheng Li, Xiu Li
2020 arXiv   pre-print
In this paper, we propose a new algorithm named Wasserstein Distance guided Adversarial Imitation Learning (WDAIL) for promoting the performance of imitation learning (IL).  ...  the reinforcement learning stage which is much simpler to implement and makes the algorithm more efficient, and (c) exploring different reward function shapes to suit different tasks for improving the  ...  In general, imitation learning can be typically divided into two categories: behavioral cloning (BC) and inverse reinforcement learning (IRL).  ... 
arXiv:2006.03503v1 fatcat:i5atdc456vfeheaxqijf2kys7i

WRGAN: Improvement of RelGAN with Wasserstein Loss for Text Generation

Ziyun Jiao, Fuji Ren
2021 Electronics  
Compared with the current loss function, the Wasserstein distance can provide more information to the generator, but RelGAN does not work well with Wasserstein distance in experiments.  ...  In this paper, we propose an improved neural network based on RelGAN and Wasserstein loss named WRGAN.  ...  Many subsequent models also rely on reinforcement learning (RL) algorithms.  ... 
doi:10.3390/electronics10030275 fatcat:d6amgpspkrhyxdmqlh6x5ygczi
« Previous Showing results 1 — 15 out of 2,135 results