Filters








6,252 Hits in 5.4 sec

Training Agents using Upside-Down Reinforcement Learning [article]

Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber
2021 arXiv   pre-print
We develop Upside-Down Reinforcement Learning (UDRL), a method for learning to act using only supervised learning techniques.  ...  Based on these results, we suggest that alternative approaches to expected reward maximization have an important role to play in training useful autonomous agents.  ...  Algorithm 1 Upside-Down Reinforcement Learning: High-level Description.  ... 
arXiv:1912.02877v2 fatcat:habglrwq2fezxnwfeueqapmzom

Learning Relative Return Policies With Upside-Down Reinforcement Learning [article]

Dylan R. Ashley, Kai Arulkumaran, Jürgen Schmidhuber, Rupesh Kumar Srivastava
2022 arXiv   pre-print
We show that upside-down reinforcement learning can learn to carry out such commands online in a tabular bandit setting and in CartPole with non-linear function approximation.  ...  Lately, there has been a resurgence of interest in using supervised learning to solve reinforcement learning problems.  ...  Upside-down reinforcement learning breaks the RL problem in two, using commands as an intermediary.  ... 
arXiv:2202.12742v2 fatcat:6jnw63ayejexlbtcnaajxwtgfq

All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL [article]

Kai Arulkumaran, Dylan R. Ashley, Jürgen Schmidhuber, Rupesh K. Srivastava
2022 arXiv   pre-print
Upside down reinforcement learning (UDRL) flips the conventional use of the return in the objective function in RL upside down, by taking returns as input and predicting actions.  ...  With a general agent architecture, a single UDRL agent can learn across all paradigms.  ...  Upside Down RL The core of UDRL is the policy, π, conditioned on "commands", c [31] .  ... 
arXiv:2202.11960v1 fatcat:txc7slfgufahbgvtowlxlwuq2q

Learning Purely Tactile In-Hand Manipulation with a Torque-Controlled Hand

Leon Sievers, Johannes Pitz, Berthold Bäuml
2022 arXiv   pre-print
We efficiently train in a precisely modeled and identified rigid body simulation with off-policy deep reinforcement learning, significantly sped up by a domain adapted curriculum, leading to a moderate  ...  600 CPU hours of training time.  ...  LEARNING THE MANIPULATION TASK 1 A. Learning Algorithm To learn the weights of the neural network controller we use reinforcement learning (RL).  ... 
arXiv:2204.03698v2 fatcat:3lwmt5vzd5g5pm7t7mwe67qdz4

Deep reinforcement learning on a multi-asset environment for trading [article]

Ali Hirsa, Joerg Osterrieder, Branka Hadji-Misheva, Jan-Alexander Posth
2021 arXiv   pre-print
The trained reinforcement learning agent is applied to trading the E-mini S&P 500 continuous futures contract. Our results in this study are preliminary and need further improvement.  ...  Deep reinforcement learning (DRL), a recently reinvigorated method with significant success in multiple domains, still has to show its benefit in the financial markets.  ...  Introduction Reinforcement learning (RL) is an area of machine learning concerned with how agents ought to take actions in an environment to maximize the notion of cumulative reward.  ... 
arXiv:2106.08437v1 fatcat:5spab6lt6fc6tmhapsuurnoz5i

Learning of feature points without additional supervision improves reinforcement learning from images [article]

Rinu Boney, Alexander Ilin, Juho Kannala
2022 arXiv   pre-print
Previous works show that feature points learned using unsupervised pre-training or human supervision can provide good features for control tasks.  ...  This information can be represented using feature points, which is a list of spatial locations in learned feature maps of an input image.  ...  A high to low drop in this value indicates that the cheetah has flipped upside down.  ... 
arXiv:2106.07995v3 fatcat:jjtbqo2srzb5rleixocroltb7i

Object-sensitive Deep Reinforcement Learning [article]

Yuezhang Li, Katia Sycara, Rahul Iyer
2018 arXiv   pre-print
We also propose a new approach called "object saliency maps" to visually explain the actions made by deep reinforcement learning agents.  ...  Although objects are important image elements, few work considers enhancing deep reinforcement learning with object characteristics.  ...  Related Work Deep Reinforcement Learning Reinforcement learning is defined as learning a policy for an agent to interact with the unknown environment.  ... 
arXiv:1809.06064v1 fatcat:nttn2vu2kzhfnpyhqnvibpedte

Control of a Quadrotor With Reinforcement Learning

Jemin Hwangbo, Inkyu Sa, Roland Siegwart, Marco Hutter
2017 IEEE Robotics and Automation Letters  
In this paper, we present a method to control a quadrotor with a neural network trained using reinforcement learning techniques.  ...  With reinforcement learning, a common network can be trained to directly map state to actuator command making any predefined control structure obsolete for training.  ...  In this work, we show more dynamic motion (i.e. dynamic stabilization from an upside-down throws) can be achieved with reinforcement learning.  ... 
doi:10.1109/lra.2017.2720851 dblp:journals/ral/HwangboSSH17 fatcat:4ardg4ea2bdbldghxykyotsif4

Multi-Game Decision Transformers [article]

Kuang-Huei Lee, Ofir Nachum, Mengjiao Yang, Lisa Lee, Daniel Freeman, Winnie Xu, Sergio Guadarrama, Ian Fischer, Eric Jang, Henryk Michalewski, Igor Mordatch
2022 arXiv   pre-print
Motivated by this progress, we investigate whether the same strategy can be used to produce generalist reinforcement learning agents.  ...  A longstanding goal of the field of AI is a strategy for compiling diverse experience into a highly capable, generalist agent.  ...  For Upside-Down RL comparison experiments Section 4.9, we also provide median humannormalized scores in Figure 13 .  ... 
arXiv:2205.15241v1 fatcat:q737syhv2fdgrhbexu47ti64eu

Deep Reinforcement Learning-based Continuous Control for Multicopter Systems

Anush Manukyan, Miguel A. Olivares-Mendez, Matthieu Geist, Holger Voos
2019 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT)  
The deep reinforcement learning method used for the training is model-free, on-policy, actor-critic based algorithm called Trust Region Policy Optimization (TRPO).  ...  We present a framework based on OpenAI GYM, Gazebo and RotorS MAV simulator, utilized for successfully training different agents to perform various tasks.  ...  [9] , where they propose a novel learning algorithm. They use neural networks in order to train a quadrotor to stabilize itself in the air even from manually thrown upside-down position.  ... 
doi:10.1109/codit.2019.8820368 dblp:conf/codit/ManukyanOGV19 fatcat:jgnuw3lzsrfpjauc33ibq3fhwe

Towards Autonomous Pipeline Inspection with Hierarchical Reinforcement Learning [article]

Nicolò Botteghi, Luuk Grefte, Mannes Poel, Beril Sirmacek, Christoph Brune, Edwin Dertien, Stefano Stramigioli
2021 arXiv   pre-print
Moreover, we introduce a hierarchical policy decomposition based on Hierarchical Reinforcement Learning to learn robust high-level navigation skills.  ...  To address this problem, we investigate the usage of Deep Reinforcement Learning for achieving autonomous navigation of in-pipe robots in pipeline networks with complex topologies.  ...  The PPO implementation of RLlib is used for training the agents. C.  ... 
arXiv:2107.03685v1 fatcat:r2t526kdfna3bhcllhidp6dfxi

Agent Inspired Trading Using Recurrent Reinforcement Learning and LSTM Neural Networks [article]

David W. Lu
2017 arXiv   pre-print
The learning model is implemented in Long Short Term Memory (LSTM) recurrent structures with Reinforcement Learning or Evolution Strategies acting as agents The robustness and feasibility of the system  ...  In order to accomplish similar level of performance and generality, like a human trader, our agents learn for themselves to create successful strategies that lead to the human-level long-term rewards.  ...  It is easier for us to explain and implement as an agent based reinforcement learning model.  ... 
arXiv:1707.07338v1 fatcat:f3hlkanncjbd3cjfqemx5t627u

On the Verge of Solving Rocket League using Deep Reinforcement Learning and Sim-to-sim Transfer [article]

Marco Pleines, Konstantin Ramthun, Yannik Wegener, Hendrik Meyer, Matthias Pallasch, Sebastian Prior, Jannik Drögemüller, Leon Büttinghaus, Thilo Röthemeyer, Alexander Kaschwig, Oliver Chmurzynski, Frederik Rohkrähmer (+3 others)
2022 arXiv   pre-print
In the case of Rocket League, we demonstrate that single behaviors of goalies and strikers can be successfully learned using Deep Reinforcement Learning in the simulation environment and transferred back  ...  Therefore, the trained agent is robust enough and able to generalize to the target domain of Rocket League.  ...  Friction (Air, Ground) and Drifting [25] A drag of −525 uu s 2 is used, which is reduced by more than half when the car is upside down.  ... 
arXiv:2205.05061v2 fatcat:lifolk7tnnbnbi6dombd4cykxi

Extending Deep Reinforcement Learning Frameworks in Cryptocurrency Market Making [article]

Jonathan Sadighian
2020 arXiv   pre-print
Two policy-based agents are trained to learn a market making trading strategy using eight days of training data and evaluate their performance using 30 days of testing data.  ...  Reinforcement learning has been applied to single- and multi-instrument use cases, such as market making or portfolio management.  ...  ACKNOWLEDGEMENTS Thank you to Toussaint Behaghel for reviewing the paper and providing helpful feedback and Florian Labat for suggesting the use of price-based events in reinforcement learning.  ... 
arXiv:2004.06985v1 fatcat:vilbfgcd5fakxhhsijmrv3ejmq

Emergence of Locomotion Behaviours in Rich Environments [article]

Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver
2017 arXiv   pre-print
Using a novel scalable variant of policy gradient reinforcement learning, our agents learn to run, jump, crouch and turn as required by the environment without explicit reward-based guidance.  ...  The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals.  ...  We focus on a set of novel locomotion tasks that go significantly beyond the previous state-of-the-art for agents trained directly from reinforcement learning.  ... 
arXiv:1707.02286v2 fatcat:7uxwftkkg5anrifb6lqh3wpaei
« Previous Showing results 1 — 15 out of 6,252 results