A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Wasserstein Unsupervised Reinforcement Learning
[article]
2021
arXiv
pre-print
Therefore we propose a new framework Wasserstein unsupervised reinforcement learning (WURL) where we directly maximize the distance of state distributions induced by different policies. ...
Unsupervised reinforcement learning aims to train agents to learn a handful of policies or skills in environments without external reward. ...
Nonetheless, the primal form estimation methods show competitive accuracy empirically. 3 Wasserstein unsupervised reinforcement learning 3.1 MI-based unsupervised reinforcement learning Traditional unsupervised ...
arXiv:2110.07940v1
fatcat:m5uljtzqwndeldfytdmmms4uaa
Wasserstein Robust Reinforcement Learning
[article]
2019
arXiv
pre-print
Our method formalises robust reinforcement learning as a novel min-max game with a Wasserstein constraint for a correct and convergent solver. ...
Reinforcement learning algorithms, though successful, tend to over-fit to training environments hampering their application to the real-world. ...
Wasserstein Robust Reinforcement Learning This section formalises robust reinforcement learning by equipping agents with capabilities of determining well-behaved policies under worst-case models which ...
arXiv:1907.13196v4
fatcat:qeldcwoy6jbh7jnauw62acmxqu
Efficient Wasserstein Natural Gradients for Reinforcement Learning
[article]
2021
arXiv
pre-print
A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL). ...
The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization. ...
INTRODUCTION Defining efficient optimization algorithms for reinforcement learning (RL) that are able to leverage a meaningful measure of similarity between policies is a longstanding and challenging problem ...
arXiv:2010.05380v4
fatcat:7bhuob2ngjfdbfzmn4q44gyxra
Robust Reinforcement Learning with Wasserstein Constraint
[article]
2020
arXiv
pre-print
Robust Reinforcement Learning aims to find the optimal policy with some extent of robustness to environmental dynamics. ...
algorithm--Wasserstein Robust Advantage Actor-Critic algorithm (WRAAC). ...
In Section 3, we mainly describe the framework of Wasserstein robust Reinforcement Learning. ...
arXiv:2006.00945v1
fatcat:zsxnk3qjvzfihdkan4i2ks6rc4
Reinforcement Learning with Wasserstein Distance Regularisation, with Applications to Multipolicy Learning
[article]
2019
arXiv
pre-print
We describe an application of Wasserstein distance to Reinforcement Learning. ...
This can be used to learn multiple polices which are different in terms of such Wasserstein distances by using a Wasserstein regulariser. ...
Figure 3 : 3 Learning two policies by repulsive Wasserstein regularisation. ...
arXiv:1802.03976v2
fatcat:w2uwyqqy5rb3tkk6srpo5vdqta
Visual Transfer for Reinforcement Learning via Wasserstein Domain Confusion
[article]
2020
arXiv
pre-print
We introduce Wasserstein Adversarial Proximal Policy Optimization (WAPPO), a novel algorithm for visual transfer in Reinforcement Learning that explicitly learns to align the distributions of extracted ...
WAPPO approximates and minimizes the Wasserstein-1 distance between the distributions of features from source and target domains via a novel Wasserstein Confusion objective. ...
Transfer in Reinforcement Learning Transfer in Reinforcement Learning has been a topic of interest far before the recent advent of Deep Neural Networks and Deep Reinforcement Learning. ...
arXiv:2006.03465v1
fatcat:yfobpv4tdrhp5njilmgygf5qnu
On Wasserstein Reinforcement Learning and the Fokker-Planck equation
[article]
2017
arXiv
pre-print
We derive policy gradients where the change in policy is limited to a small Wasserstein distance (or trust region). ...
We show that in the small steps limit with respect to the Wasserstein distance W_2, policy dynamics are governed by the Fokker-Planck (heat) equation, following the Jordan-Kinderlehrer-Otto result. ...
It is our hope that a Wasserstein loss, by implying relevant semantic directions in action space, will speed up convergence and training of reinforcement learning agents. ...
arXiv:1712.07185v1
fatcat:pfq7g32ffngdfahp376ytrqfk4
Equivalence Between Wasserstein and Value-Aware Loss for Model-based Reinforcement Learning
[article]
2018
arXiv
pre-print
Learning a generative model is a key component of model-based reinforcement learning. ...
This equivalence improves our understanding of value-aware models, and also creates a theoretical foundation for applications of Wasserstein in model-based reinforcement~learning. ...
distributions in reinforcement learning (Bellemare et al., 2017) . ...
arXiv:1806.01265v2
fatcat:7il5xbtadvel5i3kdks5c5xsse
Adversarial Intrinsic Motivation for Reinforcement Learning
[article]
2021
arXiv
pre-print
Specifically, this paper focuses on goal-conditioned reinforcement learning where the idealized (unachievable) target distribution has full measure at the goal. ...
In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement ...
Wasserstein-1 Distance for Goal-Conditioned Reinforcement Learning In this section we consider the problem of goal-conditioned reinforcement learning. ...
arXiv:2105.13345v3
fatcat:h4l5prweojdiddwxh2bo3ypyna
Wasserstein Distance Maximizing Intrinsic Control
[article]
2021
arXiv
pre-print
This paper presents an approach that rewards the agent for learning skills that maximize the Wasserstein distance of their state visitation from the start state of the skill. ...
This paper deals with the problem of learning a skill-conditioned policy that acts meaningfully in the absence of a reward signal. ...
the structure of the reinforcement learning problem. ...
arXiv:2110.15331v1
fatcat:fuicl26mv5a5fbxcblzexu2mim
Distributional Reinforcement Learning with Quantile Regression
[article]
2017
arXiv
pre-print
Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. ...
In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. ...
learning over the Wasserstein metric. ...
arXiv:1710.10044v1
fatcat:46adgv3uwjbmnkoe6ccavradyu
Reinforced Wasserstein Training for Severity-Aware Semantic Segmentation in Autonomous Driving
[article]
2020
arXiv
pre-print
Furthermore, an adaptively learning scheme of the ground matrix is proposed to utilize the high-fidelity CARLA simulator. Specifically, we follow a reinforcement alternative learning scheme. ...
The ground metric of Wasserstein distance can be pre-defined following the experience on a specific task. ...
loss (w/) in our reinforcement learning framework. ...
arXiv:2008.04751v1
fatcat:qr75d5lmjjakdhb5zsgcealfpy
CWAE-IRL: Formulating a supervised approach to Inverse Reinforcement Learning problem
[article]
2019
arXiv
pre-print
Inverse reinforcement learning (IRL) is used to infer the reward function from the actions of an expert running a Markov Decision Process (MDP). ...
The reward function is derived using a well-known deep generative model known as Conditional Variational Auto-encoder (CVAE) with Wasserstein loss function, thus referred to as Conditional Wasserstein ...
Conditional Wasserstein Auto-Encoder-IRL In this paper, my primary argument is that the inverse reinforcement learning problem can be devised as a supervised learning problem with learning of latent variable ...
arXiv:1910.00584v1
fatcat:yqw3exohanfpjpfofmglaxulhy
Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration
[article]
2020
arXiv
pre-print
In this paper, we propose a new algorithm named Wasserstein Distance guided Adversarial Imitation Learning (WDAIL) for promoting the performance of imitation learning (IL). ...
the reinforcement learning stage which is much simpler to implement and makes the algorithm more efficient, and (c) exploring different reward function shapes to suit different tasks for improving the ...
In general, imitation learning can be typically divided into two categories: behavioral cloning (BC) and inverse reinforcement learning (IRL). ...
arXiv:2006.03503v1
fatcat:i5atdc456vfeheaxqijf2kys7i
WRGAN: Improvement of RelGAN with Wasserstein Loss for Text Generation
2021
Electronics
Compared with the current loss function, the Wasserstein distance can provide more information to the generator, but RelGAN does not work well with Wasserstein distance in experiments. ...
In this paper, we propose an improved neural network based on RelGAN and Wasserstein loss named WRGAN. ...
Many subsequent models also rely on reinforcement learning (RL) algorithms. ...
doi:10.3390/electronics10030275
fatcat:d6amgpspkrhyxdmqlh6x5ygczi
« Previous
Showing results 1 — 15 out of 2,135 results