7,170 Hits in 5.4 sec

A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

Zijian Hu, Kaifang Wan, Xiaoguang Gao, Yiwei Zhai
2019 Mathematical Problems in Engineering  
In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions.  ...  For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage.  ...  Wan for the discussions. is study was supported by the National Natural Science Foundation of China (grant no. 61573285) and the Science and Technology on Avionics Integration Laboratory and Aeronautical  ... 
doi:10.1155/2019/7619483 fatcat:wgcvlxskr5djzbfrnouuliwt3m

Composable Energy Policies for Reactive Motion Generation and Reinforcement Learning

Julen Urain, Anqi Li, Puze Liu, Carlo D'Eramo, Jan Peters
2021 Zenodo  
CEP computes the control action by optimization over the product of a set of stochastic policies.  ...  We introduce Composable Energy Policies (CEP), a novel framework for modular reactive motion generation.  ...  Fig. 6 . 6 Training curves for Hitting a puck environment. CEP-PPO performs consistenly better than other prior + RL methods.  ... 
doi:10.5281/zenodo.5336918 fatcat:fezqux2ejzaknfll3deasrapcy

Composable Energy Policies for Reactive Motion Generation and Reinforcement Learning [article]

Julen Urain, Anqi Li, Puze Liu, Carlo D'Eramo, Jan Peters
2021 arXiv   pre-print
CEP computes the control action by optimization over the product of a set of stochastic policies.  ...  We introduce Composable Energy Policies (CEP), a novel framework for modular reactive motion generation.  ...  Fig. 6 . 6 Training curves for Hitting a puck environment. CEP-PPO performs consistenly better than other prior + RL methods.  ... 
arXiv:2105.04962v1 fatcat:mtyhycafqbgmlbfglcuq2wwbza

Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning [article]

Wendelin Böhmer and Rong Guo and Klaus Obermayer
2016 arXiv   pre-print
The presented approach is simple and should also be easily transferable to more sophisticated algorithms like deep reinforcement learning.  ...  Successful trajectories reach the goal within 100 actions without hitting a wall.  ...  CPI converges for small α ∈ [0, 1].  ... 
arXiv:1612.07548v1 fatcat:cj3eyhvydzbrhhqbvxv4sykzxe

Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration [article]

André Biedenkapp, Nguyen Dang, Martin S. Krejca, Frank Hutter, Carola Doerr
2022 arXiv   pre-print
One of the few exceptions for which we know which parameter settings minimize the expected runtime is the LeadingOnes problem.  ...  Several approaches to address the dynamic parameter setting problem exist, but we barely understand which ones to prefer for which applications.  ...  The authors acknowledge the HPCaVe computing platform of Sorbonne Université for providing computational resources to this research project.  ... 
arXiv:2202.03259v2 fatcat:ppyu26zeyfeq7dpbwh3dz7h2ae

Modular Reinforcement Learning: A Case Study in a Robot Domain

Zsolt Kalmár, Csaba Szepesvári, András Lörincz
2000 Acta Cybernetica  
The key idea is to break up the problem into subtasks and design controllers for each of the subtasks.  ...  The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, finite state-and action-space, discrete-time controlled Markov-chains.  ...  Acknowledgements The authors would like to thank Zoltán Gábor for his efforts of building the experimental environment. This work was supported by the CSEM Switzerland, OTKA Grants No.  ... 
dblp:journals/actaC/KalmarSL00 fatcat:yknd6eiluraxxmsp43jf74z45y

Theory of Parameter Control for Discrete Black-Box Optimization: Provable Performance Gains Through Dynamic Parameter Choices [article]

Benjamin Doerr, Carola Doerr
2018 arXiv   pre-print
Parameter control aims at realizing performance gains through a dynamic choice of the parameters which determine the behavior of the underlying optimization algorithm.  ...  In the context of evolutionary algorithms this research line has for a long time been dominated by empirical approaches.  ...  This work was supported by a public grant as part of the Investissement d'avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH, in a joint call with the Gaspard Monge Program for optimization, operations  ... 
arXiv:1804.05650v2 fatcat:lspslqoz5faepch2grmnjgplwi

Versatile Control of Fluid-Directed Solid Objects Using Multi-Task Reinforcement Learning

Bo Ren, Xiaohan Ye, Zherong Pan, Taiyuan Zhang
2022 ACM Transactions on Graphics  
We propose a learning-based controller for high-dimensional dynamic systems with coupled fluid and solid objects.  ...  In our experiments, the average correct hit rate is above 70%, and it can reach 100% in some simple-music playing trials.  ...  In our experiments, we ind this small set of tuples is enough for the sub-network to learn a good simulator parameter representation.  ... 
doi:10.1145/3554731 fatcat:ahkvuqlyerfuxkeu7cd6yy7xrm

A Survey on Reinforcement Learning-Aided Caching in Heterogeneous Mobile Edge Networks

Nikolaos Nomikos, Spyros Zoupanos, Themistoklis Charalambous, Ioannis Krikidis
2022 IEEE Access  
Among the various machine learning categories, reinforcement learning provides autonomous operation without relying on large sets of historical data for training.  ...  Meanwhile, the fusion of machine learning and wireless networks offers new opportunities for network optimization when traditional optimization approaches fail or incur high complexity.  ...  In the third step, RL is used to determine the optimal caching decision which maximizes the cache hit rate.  ... 
doi:10.1109/access.2022.3140719 fatcat:565r4jxrinfxtpgvuc5xfvxmle

RLCache: Automated Cache Management Using Reinforcement Learning [article]

Sami Alabed
2019 arXiv   pre-print
An optimal cache manager will avoid unnecessary operations, maximise the cache hit rate which results in fewer round trips to a slower backend storage system, and minimise the size of storage needed to  ...  achieve a high hit-rate.  ...  Eiko Yoneki for valuable and constructive suggestions during the planning and development of this research work.  ... 
arXiv:1909.13839v1 fatcat:4k7rj3letzez3ajpg5n5kccp6y

Tree-Structured Reinforcement Learning for Sequential Object Localization [article]

Zequn Jie, Xiaodan Liang, Jiashi Feng, Xiaojie Jin, Wen Feng Lu and Shuicheng Yan
2017 arXiv   pre-print
Allowing multiple near-optimal policies, Tree-RL offers more diversity in search paths and is able to find multiple objects with a single feed-forward pass.  ...  To incorporate global interdependency between objects into object localization, we propose an effective Tree-structured Reinforcement Learning (Tree-RL) approach to sequentially search for objects by fully  ...  Tree-RL vs Single Optimal Search Path RL: We first compare the performance in recall rate between the proposed Tree-RL and a single optimal search path RL on PASCAL VOC 2007 testing set.  ... 
arXiv:1703.02710v1 fatcat:he7f3lx2ujh6vaunqzajhqlgnu

Identifying Critical States by the Action-Based Variance of Expected Return [article]

Izumi Karino, Yoshiyuki Ohmura, Yasuo Kuniyoshi
2020 arXiv   pre-print
These simple methods accelerate RL in a grid world with cliffs and two baseline tasks of deep RL.  ...  These critical states are the states at which the action selection changes the potential of success and failure substantially.  ...  based partly on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO), and partly on results obtained from research activity in Chair for  ... 
arXiv:2008.11332v1 fatcat:45jw56yo25a6zfdcbu6nyw35ca

Fast Exploration with Simplified Models and Approximately Optimistic Planning in Model Based Reinforcement Learning [article]

Ramtin Keramati, Jay Whang, Patrick Cho, Emma Brunskill
2018 arXiv   pre-print
Inspired by this, we investigate two issues in leveraging model-based RL for sample efficiency.  ...  People seem to build simple models that are easy to learn to support planning and strategic exploration.  ...  In this setting, the optimal policy is to always hit the ball with the lower region of the ball. The game is deterministic and model free methods with -greedy exploration (e.g.  ... 
arXiv:1806.00175v2 fatcat:f4cbjoacq5fwlorjwgvjes7mgi

Sample-efficient Reinforcement Learning in Robotic Table Tennis [article]

Jonas Tebbe, Lukas Krauch, Yapeng Gao, Andreas Zell
2021 arXiv   pre-print
Reinforcement learning (RL) has achieved some impressive recent successes in various computer games and simulations.  ...  An actor-critic based deterministic policy gradient algorithm was developed for accelerated learning.  ...  Our key contributions are summarized by the following: • A simulation was developed and used to tune an RL algorithm based on DDPG for rapid learning on small datasets. • The RL algorithm was integrated  ... 
arXiv:2011.03275v3 fatcat:giv2ni6pcvf5zk5ekt4exihxqq

Affordable On-line Dialogue Policy Learning

Cheng Chang, Runzhe Yang, Lu Chen, Xiang Zhou, Kai Yu
2017 Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing  
For solving the unsustainable learning problem, we proposed a complete companion teaching framework incorporating the guidance from the human teacher.  ...  And for policy learning, we set a small per-turn penalty of one to encourage short interactions, i.e.  ...  The empirical satisfacto- ry target success rate for the student is 70% in our experimental settings.  ... 
doi:10.18653/v1/d17-1234 dblp:conf/emnlp/ChangYCZY17 fatcat:vttfnsmm65enxmscfy75lz67gq
« Previous Showing results 1 — 15 out of 7,170 results