5,380 Hits in 7.2 sec

Challenges in High-dimensional Reinforcement Learning with Evolution Strategies [article]

Nils Müller, Tobias Glasmachers
2018 arXiv   pre-print
Evolution Strategies (ESs) have recently become popular for training deep neural networks, in particular on reinforcement learning tasks, a special form of controller design.  ...  In addition, many control problems give rise to a stochastic fitness function.  ...  Our results indicate that a scalable modern evolution strategy with step size and efficient metric learning equipped with uncertainty handling is the most promising general-purpose technique for high-dimensional  ... 
arXiv:1806.01224v2 fatcat:oz3qbxjknrhzpfjatd2jmwrxci

CEM-GD: Cross-Entropy Method with Gradient Descent Planner for Model-Based Reinforcement Learning [article]

Kevin Huang, Sahin Lale, Ugo Rosolia, Yuanyuan Shi, Anima Anandkumar
2021 arXiv   pre-print
Current state-of-the-art model-based reinforcement learning algorithms use trajectory sampling methods, such as the Cross-Entropy Method (CEM), for planning in continuous control settings.  ...  These zeroth-order optimizers require sampling a large number of trajectory rollouts to select an optimal action, which scales poorly for large prediction horizons or high dimensional action spaces.  ...  Methods Preliminaries: Cross-Entropy Method for Trajectory planning In model-based reinforcement learning Nagabandi et al. (2018) , a common scheme for action selection is to use model predictive control  ... 
arXiv:2112.07746v1 fatcat:fg5qovlnhres7bb5bbwtfq64d4

High-Accuracy Model-Based Reinforcement Learning, a Survey [article]

Aske Plaat and Walter Kosters and Mike Preuss
2021 arXiv   pre-print
In recent years, a diverse landscape of model-based methods has been introduced to improve model accuracy, using methods such as uncertainty modeling, model-predictive control, latent models, and end-to-end  ...  To reduce the number of environment samples, model-based reinforcement learning creates an explicit model of the environment dynamics.  ...  Acknowledgments We thank the members of the Leiden Reinforcement Learning Group, and especially Thomas Moerland and Mike Huisman, for many discussions and insights.  ... 
arXiv:2107.08241v1 fatcat:tma6xb2uy5fybjfhmzasfx2cta

Autonomous Exploration Under Uncertainty via Deep Reinforcement Learning on Graphs [article]

Fanfei Chen, John D. Martin, Yewei Huang, Jinkun Wang, Brendan Englot
2020 arXiv   pre-print
We propose a novel approach that uses graph neural networks (GNNs) in conjunction with deep reinforcement learning (DRL), enabling decision-making over graphs containing exploration information to predict  ...  For this problem, belief space planning methods that forward-simulate robot sensing and estimation may often fail in real-time implementation, scaling poorly with increasing size of the state, belief and  ...  In this paper we consider value-based methods and policybased methods for model-free control.  ... 
arXiv:2007.12640v1 fatcat:4x2lsdzoazeq7ayauocjsjqqq4

Risk-Aware Model-Based Control

Chen Yu, Andre Rosendo
2021 Frontiers in Robotics and AI  
In this work, a novel MBRL method is proposed, called Risk-Aware Model-Based Control (RAMCO).  ...  facing high-dimensional and complex problems.  ...  RISK-AWARE MODEL-BASED CONTROL In this work, we propose a model-based method with a probabilistic dynamics model, and our main objective is to learn a safe and scalable policy efficiently.  ... 
doi:10.3389/frobt.2021.617839 pmid:33778013 pmcid:PMC7990789 fatcat:v4thq6253zgjpde6zhowd6ufca

Real-Time Model Calibration with Deep Reinforcement Learning [article]

Yuan Tian, Manuel Arias Chao, Chetan Kulkarni, Kai Goebel, Olga Fink
2020 arXiv   pre-print
In this paper, we propose a novel framework for inference of model parameters based on reinforcement learning.  ...  However, fast and accurate inference for processes with large and high dimensional datasets cannot easily be achieved with state-of-the-art methods under noisy real-world conditions.  ...  Scalability to Large Dataset and High Dimensional Model Calibration Parameters θ.  ... 
arXiv:2006.04001v2 fatcat:dndjxnsaubhi3jkpxu6tkqkfdm

Scalable Global Optimization via Local Bayesian Optimization [article]

David Eriksson, Michael Pearce, Jacob R Gardner, Ryan Turner, Matthias Poloczek
2020 arXiv   pre-print
Bayesian optimization has recently emerged as a popular method for the sample-efficient optimization of expensive black-box functions.  ...  This motivates the design of a local probabilistic approach for global optimization of large-scale high-dimensional problems.  ...  Lunar landing reinforcement learning Here the goal is to learn a controller for a lunar lander implemented in the OpenAI gym 3 .  ... 
arXiv:1910.01739v4 fatcat:z6orkycjnjgmjcvdaymd2xp4gm

An Adversarial Objective for Scalable Exploration [article]

Bernadette Bucher, Karl Schmeckpeper, Nikolai Matni, Kostas Daniilidis
2020 arXiv   pre-print
Model-based curiosity combines active learning approaches to optimal sampling with the information gain based incentives for exploration presented in the curiosity literature.  ...  This discriminator is optimized jointly with a prediction model and enables our active learning approach to sample sequences of observations and actions which result in predictions considered the least  ...  ACKNOWLEDGEMENTS The authors are grateful for support through the Curious Minded Machines project funded by the Honda Research Institute.  ... 
arXiv:2003.06082v4 fatcat:gb7vwk35p5fk3drgho37tbx3dq

Building a Scalable and Interpretable Bayesian Deep Learning Framework for Quality Control of Free Form Surfaces

Sumit Sinha, Pasquale Franciosa, Dariusz Ceglarek
2021 IEEE Access  
ACKNOWLEDGMENT This study was supported by the UK EPSRC project EP/K019368/1: "Self-Resilient Reconfigurable Assembly Systems with In-process Quality Improvement" and the WMG-IIT scholarship.  ...  Further, to exponentially enhance the scalability for high dimensional MAS and reduce CAE simulation time, uncertainty guided continual learning [8] and transfer learning [7] features are integrated  ...  for a single MAS by leveraging uncertainty estimates of the Bayesian 3D U-net OSER-MAS model. (2) Uncertainty guided transfer/continual learning-based scalability model to transfer meta-knowledge from  ... 
doi:10.1109/access.2021.3068867 fatcat:wb6yf6j7nbdxrifpmbg6vew6iy

Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach [article]

James Queeney, Ioannis Ch. Paschalidis, Christos G. Cassandras
2020 arXiv   pre-print
When combined with small sample sizes, these methods can result in unstable learning due to their reliance on high-dimensional sample-based estimates.  ...  In order for reinforcement learning techniques to be useful in real-world decision making processes, they must be able to produce robust performance from limited data.  ...  Acknowledgments This research was partially supported by the NSF under grants ECCS-1931600, DMS-1664644, CNS-1645681, and IIS-1914792  ... 
arXiv:2012.10791v1 fatcat:euqqdcoi4rea3i52ww7p3poyte

A Survey on Learning-Based Model Predictive Control: Toward Path Tracking Control of Mobile Platforms

Kanghua Zhang, Jixin Wang, Xueting Xin, Xiang Li, Chuanwen Sun, Jianfei Huang, Weikang Kong
2022 Applied Sciences  
The learning-based model predictive control (LB-MPC) is an effective and critical method to solve the path tracking problem in mobile platforms under uncertain disturbances.  ...  It is well known that the machine learning (ML) methods use the historical and real-time measurement data to build data-driven prediction models.  ...  Acknowledgments: The authors would like to thank all anonymous reviewers and editors for their helpful suggestions for the improvement of this paper.  ... 
doi:10.3390/app12041995 fatcat:hdy753pdbfhjvihdopaw6lip2u

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning [article]

Yecheng Jason Ma, Andrew Shen, Osbert Bastani, Dinesh Jayaraman
2021 arXiv   pre-print
Reinforcement Learning (RL) agents in the real world must satisfy safety constraints in addition to maximizing a reward objective.  ...  We propose Conservative and Adaptive Penalty (CAP), a model-based safe RL framework that accounts for potential modeling errors by capturing model uncertainty and adaptively exploiting it to balance the  ...  Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein  ... 
arXiv:2112.07701v1 fatcat:fpaqm7dd5jfujcwzozgn4yotly

Wide Area Measurement System-based Low Frequency Oscillation Damping Control through Reinforcement Learning [article]

Yousaf Hashmy, Zhe Yu, Di Shi, Yang Weng
2020 arXiv   pre-print
Such a technique has a unique characteristic to learn on diverse scenarios and operating conditions by exploring the environment and devising an optimal control action policy by implementing policy gradient  ...  Lately, wide area measurement system-based centralized controlling techniques started providing a more flexible and robust control to keep the system stable.  ...  Such a systematic validation shows the applicability and scalability of the reinforcement learning-based oscillation damping controller.  ... 
arXiv:2001.07829v1 fatcat:jzp66s4w4nfo5botusqsigacdy

Scalable Learning Paradigms for Data-Driven Wireless Communication [article]

Yue Xu, Feng Yin, Wenjun Xu, Chia-Han Lee, Jiaru Lin, Shuguang Cui
2020 arXiv   pre-print
However, the ever exploding data volume and model complexity will limit centralized solutions to learn and respond within a reasonable time.  ...  Therefore, scalability becomes a critical issue to be solved. In this article, we aim to provide a systematic discussion on the building blocks of scalable data-driven wireless networks.  ...  Besides, Bayesian neural networks and deep GP algorithms can be combined with RL techniques to form Bayesian deep reinforcement learning (BDRL) to achieve a fully Bayesian control framework.  ... 
arXiv:2003.00474v1 fatcat:kd6plphwgbfvbdyylcn4jk24uq

Deep Reinforcement Learning Versus Evolution Strategies: A Comparative Survey [article]

Amjad Yousef Majid, Serge Saaybi, Tomas van Rietbergen, Vincent Francois-Lavet, R Venkatesha Prasad, Chris Verhoeven
2021 arXiv   pre-print
After presenting their fundamental concepts and algorithms, a comparison is provided on key aspects such as scalability, exploration, adaptation to dynamic environments, and multi-agent learning.  ...  Deep Reinforcement Learning (DRL) and Evolution Strategies (ESs) have surpassed human-level control in many sequential decision-making problems, yet many open challenges still exist.  ...  Their approach consists of two main steps: (i) learning a state representation and initial policy from high-dimensional input data using gradient-based methods (i.e., DQN or DDPG); and (ii) fine-tuning  ... 
arXiv:2110.01411v1 fatcat:nw47ududyndyljlh4nx2gm73jq
« Previous Showing results 1 — 15 out of 5,380 results