Filters








146 Hits in 6.3 sec

Deep reinforcement learning for shared control of mobile robots

Chong Tian, Shahil Shaik, Yue Wang
2021 IET Cyber-Systems and Robotics  
In addition, an extended DAGGER (DAGGERX) human agent is developed for training the RL agent to reduce human workload. Robot simulations and experiments with humans in the loop are conducted.  ...  In this study, the authors develop an extended Twin Delayed Deep Deterministic Policy Gradient (DDPG) (TD3X)-based shared control framework that learns to assist a human operator in teleoperating mobile  ...  F I G U R E 1 2 Comparison of the average return over the last 10 evaluation loops over 5 seeds for four deep RL-based shared control systems in the complex maze environment.  ... 
doi:10.1049/csy2.12036 fatcat:3lgmxs5zjrhn5czikssgb5bcle

Sample-Efficient Imitation Learning via Generative Adversarial Nets [article]

Lionel Blondé, Alexandros Kalousis
2019 arXiv   pre-print
Following a model-based approach, (Baram et al., 2017) recovers the gradient of the discriminator with respect to actions (via reparametrization tricks) and with respect to states (via a forward model  ...  In Figure 2 , we use scatter plots to visualise every episodic return, for every random seed.  ... 
arXiv:1809.02064v3 fatcat:xxihg6wl2bdy3kci6kzhoj5lli

A Primer on Maximum Causal Entropy Inverse Reinforcement Learning [article]

Adam Gleave, Sam Toyer
2022 arXiv   pre-print
We hope this will serve both as an introductory resource for those new to the field, and as a concise reference for those already familiar with these topics.  ...  Acknowledgements We would like to thank Alyssa Li Dayan, Michael Dennis, Yawen Duan, Daniel Filan, Erik Jenner, Niklas Lauffer and Cody Wild for feedback on earlier versions of this manuscript.  ...  In contrast to Bayesian IRL, algorithms based on MCE IRL have scaled to high-dimensional environments.  ... 
arXiv:2203.11409v1 fatcat:gpcbomxf3nbbzkhkiw6uzqm36u

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning [article]

Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson
2020 arXiv   pre-print
To evaluate the performance of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.  ...  Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.  ...  We would like to thank Frans Oliehoek and Wendelin Boehmer for helpful comments and discussion.  ... 
arXiv:2003.08839v2 fatcat:4zm4rksi7jgm7c2le6vp36jvtu

Approximate Inference with Amortised MCMC [article]

Yingzhen Li, Richard E. Turner, Qiang Liu
2017 arXiv   pre-print
Experiments consider image modelling with deep generative models as a challenging test for the method.  ...  Deep models trained using amortised MCMC are shown to generate realistic looking samples as well as producing diverse imputations for images with regions of missing pixels.  ...  Though theoretically appealing, this method seems still impractical for large scale problems.  ... 
arXiv:1702.08343v2 fatcat:t7igg5ix7bdgljvz7i6s6iwov4

Mirror Learning: A Unifying Framework of Policy Optimisation [article]

Jakub Grudzien Kuba, Christian Schroeder de Witt, Jakob Foerster
2022 arXiv   pre-print
Modern deep reinforcement learning (RL) algorithms are motivated by either the general policy improvement (GPI) or trust-region learning (TRL) frameworks.  ...  Mirror learning sets us free to boldly explore novel, theoretically sound RL algorithms, a thus far uncharted wonderland.  ...  Instead, large scale settings employ function approximation and sample based learning.  ... 
arXiv:2201.02373v9 fatcat:3tui6hiucnhwtmeiorqohtzsta

Artificial Intelligence for Prosthetics - challenge solutions [article]

Łukasz Kidziński, Carmichael Ong, Sharada Prasanna Mohanty, Jennifer Hicks, Sean F. Carroll, Bo Zhou, Hongsheng Zeng, Fan Wang, Rongzhong Lian, Hao Tian, Wojciech Jaśkowski, Garrett Andersen, Odd Rune Lykkebø, Nihat Engin Toklu (+31 others)
2019 arXiv   pre-print
In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches.  ...  In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity  ...  Discussion We proposed the Distributed Quantile Ensemble Critic (DQEC), an off-policy RL algorithm for continuous control, which combines a number of recent advances in deep RL.  ... 
arXiv:1902.02441v1 fatcat:hf7xzitrhjdqfb5cfaneovlfa4

Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation [article]

Stephen James, Andrew J. Davison
2022 arXiv   pre-print
Despite the success of reinforcement learning methods, they have yet to have their breakthrough moment when applied to a broad range of robotic manipulation tasks.  ...  All methods use the LeakyReLU activation, layer normalisation in the convolution layers, learning rate of 3 × 10 −3 , soft target update of τ = 5 −4 , and a reward scaling of 100.  ...  In practice we make use of the clipped double-Q trick [38] , which takes the minimum Q-value between two Q networks, but have omitted in the equations for brevity.  ... 
arXiv:2105.14829v2 fatcat:vuvtxbcx4rgxhowqsfu7vty4ii

Relative Entropy Regularized Policy Iteration [article]

Abbas Abdolmaleki, Jost Tobias Springenberg, Jonas Degrave, Steven Bohez, Yuval Tassa, Dan Belov, Nicolas Heess, Martin Riedmiller
2018 arXiv   pre-print
We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function.  ...  Our algorithm draws on connections to existing literature on black-box optimization and 'RL as an inference' and it can be seen either as an extension of the Maximum a Posteriori Policy Optimisation algorithm  ...  However the last exponential term is a normalisation constant for q.  ... 
arXiv:1812.02256v1 fatcat:bfwxwdrtejed7hjewdheh2yoxy

Machine Theory of Mind [article]

Neil C. Rabinowitz, Frank Perbet, H. Francis Song, Chiyuan Zhang, S.M. Ali Eslami, Matthew Botvinick
2018 arXiv   pre-print
for machine-human interaction, and for advancing the progress on interpretable AI.  ...  We apply the ToMnet to agents behaving in simple gridworld environments, showing that it learns to model random, algorithmic, and deep reinforcement learning agents from varied populations, and that it  ...  Deep RL agent training and architecture Deep RL agents were based on the UNREAL architecture (Jaderberg et al., 2017) . These were trained with over 100M episode steps, using 16 CPU workers.  ... 
arXiv:1802.07740v2 fatcat:eosf7bo3ybbybo6tethmc4nrie

Bayesian Bellman Operators [article]

Matthew Fellows, Kristian Hartikainen, Shimon Whiteson
2021 arXiv   pre-print
We prove that Bayesian solutions are consistent with frequentist RL solutions, even when approximate inference isused, and derive conditions for which convergence properties hold.  ...  In this paper, we use BBO to provide a rigorous theoretical analysis of model-free Bayesian RL to better understand its relationshipto established frequentist RL methodologies.  ...  We would like to thank Piotr Miłoś, whose proof for a similar problem inspired our proof of Lemma 3.  ... 
arXiv:2106.05012v3 fatcat:aj6crm7dmzhhbldfkiiw2sxz2a

VIREL: A Variational Inference Framework for Reinforcement Learning [article]

Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson
2020 arXiv   pre-print
learning deterministic policies in maximum entropy RL based approaches.  ...  We propose VIREL, a novel, theoretically grounded probabilistic inference framework for RL that utilises a parametrised action-value function to summarise future dynamics of the underlying MDP.  ...  Yet another approach, the MERL inference framework [34] (which we refer to as MERLIN), derives from maximum entropy reinforcement learning (MERL) [32, 72, 73, 71] .  ... 
arXiv:1811.01132v9 fatcat:krhtzi34fjdljo6kiviqmpagxy

AutonoML: Towards an Integrated Framework for Autonomous Machine Learning [article]

David Jacob Kedziora and Katarzyna Musial and Bogdan Gabrys
2022 arXiv   pre-print
Systematically comparing two distinct types of model, like an SVM against an MLP, adds yet another layer of challenge.  ...  Alternatively, complementing GAs and RL, another major class of NAS approaches revolves around gradient optimisation.  ... 
arXiv:2012.12600v2 fatcat:6rj4ubhcjncvddztjs7tql3itq

COGENT: Certified Compilation for a Functional Systems Language [article]

Liam O'Connor, Christine Rizkallah, Zilin Chen, Sidney Amani, Japheth Lim, Yutaka Nagashima, Thomas Sewell, Alex Hixon, Gabriele Keller, Toby Murray, Gerwin Klein
2016 arXiv   pre-print
We present a self-certifying compiler for the COGENT systems language.  ...  The language is suited for layered systems code with minimal sharing such as file systems or network protocol control code.  ...  Our type system is loosely based on the polymorphic λ URAL of Ahmed et al. [2005] .  ... 
arXiv:1601.05520v1 fatcat:xulbglfvp5ccdjeqphngckbz34

Sample Efficient Deep Reinforcement Learning for Dialogue Systems with Large Action Spaces [article]

Gellért Weisz, Paweł Budzianowski, Pei-Hao Su, Milica Gašić
2018 arXiv   pre-print
We show that this method beats the current state-of-the-art in deep learning approaches for spoken dialogue systems.  ...  In this paper, we investigate deep reinforcement learning approaches to solve this problem.  ...  ACKNOWLEDGMENT The authors would like to thank all members of the Dialogue Systems Group for useful comments and suggestions.  ... 
arXiv:1802.03753v1 fatcat:4gwej6zovjbhhcewhq4zldbyge
« Previous Showing results 1 — 15 out of 146 results