Filters








235 Hits in 4.2 sec

A Unified q-Memorization Framework for Asynchronous Stochastic Optimization

Bin Gu, Wenhan Xian, Zhouyuan Huo, Cheng Deng, Heng Huang
2020 Journal of machine learning research  
Specifically, based on the q-memorization framework, 1) we propose an asynchronous stochastic gradient hard thresholding algorithm with q-memorization (AsySGHT-qM) for the non-convex optimization with  ...  In this paper, we bridge this gap by using an unified q-memorization framework for various variance reduction techniques (including SVRG, S2GD, SAGA, q-SAGA) to analyze asynchronous stochastic algorithms  ...  based on the unified q-memorization framework.  ... 
dblp:journals/jmlr/GuXHDH20 fatcat:6mqu7l6jz5gtjkmrux5qihhoxu

Using Deep Reinforcement Learning for the Continuous Control of Robotic Arms [article]

Winfried Lötzsch
2018 arXiv   pre-print
The concluding evaluation proves the general applicability of the described concepts by testing them using a simulated environment. These concepts might be reused for future experiments.  ...  We test a newly created combination of two commonly used reinforcement learning methods, whether it is able to learn more effectively than a baseline.  ...  Both outputs are unified to form an estimate of the Q-value which is then used for training.  ... 
arXiv:1810.06746v1 fatcat:pz5xo5ezdnf57pfjlxzvjcifxa

2021 Index IEEE Transactions on Neural Networks and Learning Systems Vol. 32

2021 IEEE Transactions on Neural Networks and Learning Systems  
The Author Index contains the primary entry for each item, listed under the first author's name.  ...  ., +, TNNLS Nov. 2021 5022-5033 A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems.  ...  ., +, TNNLS July 2021 3156-3167 Decoding A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems.  ... 
doi:10.1109/tnnls.2021.3134132 fatcat:2e7comcq2fhrziselptjubwjme

Learning RoboCup-Keepaway with Kernels

Tobias Jung, Daniel Polani
2007 Journal of machine learning research  
We employ the general framework of approximate policy iteration with least-squares-based policy evaluation.  ...  Key challenges in keepaway are the highdimensionality of the state space (rendering conventional discretization-based function approximation like tilecoding infeasible), the stochasticity due to noise  ...  Acknowledgments The authors wish to thank the anonymous reviewers for their useful comments and suggestions.  ... 
dblp:journals/jmlr/JungP07 fatcat:vqrrlh2vxnbblbamesobl4uxmu

Learning RoboCup-Keepaway with Kernels [article]

Tobias Jung, Daniel Polani
2012 arXiv   pre-print
We employ the general framework of approximate policy iteration with least-squares-based policy evaluation.  ...  Key challenges in keepaway are the high-dimensionality of the state space (rendering conventional discretization-based function approximation like tilecoding infeasible), the stochasticity due to noise  ...  Acknowledgments The authors wish to thank the anonymous reviewers for their useful comments and suggestions.  ... 
arXiv:1201.6626v1 fatcat:xsbsokwdyncink3unpvenjowoi

On Monte Carlo Tree Search and Reinforcement Learning

Tom Vodopivec, Spyridon Samothrakis, Branko Ster
2017 The Journal of Artificial Intelligence Research  
Our study promotes a unified view of learning, planning, and search.  ...  We show that a straightforward adaptation of RL semantics within tree search can lead to a wealth of new algorithms, for which the traditional MCTS is only one of the variants.  ...  We would like to thank also Michael Fairbank and Mark Nelson for providing very useful last-minute feedback.  ... 
doi:10.1613/jair.5507 fatcat:igffnyo5hfbyzigzxpp6t6pebi

Applications of Deep Reinforcement Learning in Communications and Networking: A Survey [article]

Nguyen Cong Luong, Dinh Thai Hoang, Shimin Gong, Dusit Niyato, Ping Wang, Ying-Chang Liang, Dong In Kim
2018 arXiv   pre-print
Furthermore, we present applications of deep reinforcement learning for traffic routing, resource sharing, and data collection.  ...  This paper presents a comprehensive literature review on applications of deep reinforcement learning in communications and networking.  ...  unified framework.  ... 
arXiv:1810.07862v1 fatcat:qc3mqk2norazvc2xnynau6bqzu

A Survey of Deep Reinforcement Learning in Video Games [article]

Kun Shao, Zhentao Tang, Yuanheng Zhu, Nannan Li, Dongbin Zhao
2019 arXiv   pre-print
A large number of video game AIs with DRL have achieved super-human performance, while there are still some challenges in this domain.  ...  We also take a review of the achievements of DRL in various video games, including classical Arcade games, first-person perspective games and multi-agent real-time strategy games, from 2D to 3D, and from  ...  ACKNOWLEDGMENT The authors would like to thank Qichao Zhang, Dong Li and Weifan Li for the helpful comments and discussions about this work.  ... 
arXiv:1912.10944v2 fatcat:fsuzp2sjrfcgfkyclrsyzflax4

Applications of Deep Reinforcement Learning in Communications and Networking: A Survey

Nguyen Cong Luong, Dinh Thai Hoang, Shimin Gong, Dusit Niyato, Ping Wang, Ying-Chang Liang, Dong In Kim
2019 IEEE Communications Surveys and Tutorials  
Furthermore, we present applications of deep reinforcement learning for traffic routing, resource sharing, and data collection.  ...  This paper presents a comprehensive literature review on applications of deep reinforcement learning in communications and networking.  ...  unified framework.  ... 
doi:10.1109/comst.2019.2916583 fatcat:5owsswhhrbctnirdtxre6mhv24

Zero-touch Continuous Network Slicing Control via Scalable Actor-Critic Learning [article]

Farhad Rezazadeh, Hatim Chergui, Christos Verikoukis
2021 arXiv   pre-print
knowledge learned in the past to solve future problems and re-configure computing resources autonomously while minimizing latency, energy consumption, and virtual network function (VNF) instantiation cost for  ...  The paper defines and corroborates via extensive experimental results a zero-touch network slicing scheme with a multi-objective approach where the central server learns continuously to accumulate the  ...  Furthermore, we use actor-critic architecture and pursue a joint policy and state-action return distribution optimization where we propose a prioritized asynchronous actor-learner optimized for the network  ... 
arXiv:2101.06654v1 fatcat:ijtxmfamifdvno2vjkjcqb2m5e

Reducing Noise in GAN Training with Variance Reduced Extragradient [article]

Tatjana Chavdarova, Gauthier Gidel, François Fleuret, Simon Lacoste-Julien
2020 arXiv   pre-print
We address this issue with a novel stochastic variance-reduced extragradient (SVRE) optimization algorithm, which for a large class of games improves upon the previous convergence rates proposed in the  ...  We study the effect of the stochastic gradient noise on the training of generative adversarial networks (GANs) and show that it can prevent the convergence of standard game optimization methods, while  ...  Grant RGPIN-2017-06936, by the Hasler Foundation through the MEMUDE project, and by a Google  ... 
arXiv:1904.08598v3 fatcat:gotf432ufbfg3dhsdbu6lwabvi

Policy Message Passing: A New Algorithm for Probabilistic Graph Inference [article]

Zhiwei Deng, Greg Mori
2019 arXiv   pre-print
In this paper, we present the Policy Message Passing algorithm, which takes a probabilistic perspective and reformulates the whole information aggregation as stochastic sequential processes.  ...  A general graph-structured neural network architecture operates on graphs through two core components: (1) complex enough message functions; (2) a fixed information aggregation process.  ...  We adopt the standard variational inference method to derive the following equations: log p(y|s) − KL(q(τ |y, s)||π * (τ |y, s)) = E τ ∼q log p(y|τ , s) − KL(q(τ |s, y)||π(τ |s)) (9) We can optimize the  ... 
arXiv:1909.13196v1 fatcat:ah2urnhnvjhjleibyik6iuqvuy

Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning [article]

Lukas Brunke, Melissa Greeff, Adam W. Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, Angela P. Schoellig
2021 arXiv   pre-print
This article provides a concise but holistic review of the recent advances made in using machine learning to achieve safe decision making under uncertainties, with a focus on unifying the language and  ...  frameworks used in control theory and reinforcement learning research.  ...  In the proposed approach, the last layer of the DNN is updated at a higher frequency for fast adaptation, while the inner layers are updated at a lower frequency to "memorize" pertinent features for the  ... 
arXiv:2108.06266v2 fatcat:gbbe3qyatfgelgzhqzglecr5qm

RLOC: Terrain-Aware Legged Locomotion using Reinforcement Learning and Optimal Control [article]

Siddhant Gangapurwala, Mathieu Geisert, Romeo Orsolino, Maurice Fallon, Ioannis Havoutis
2020 arXiv   pre-print
We present a unified model-based and data-driven approach for quadrupedal planning and control to achieve dynamic locomotion over uneven terrain.  ...  We train and evaluate our framework on a complex quadrupedal system, ANYmal version B, and demonstrate transferability to a larger and heavier robot, ANYmal C, without requiring retraining.  ...  In this work, we propose a unified RL and optimal control (OC) based terrain-aware legged locomotion framework as illustrated in Fig. 3 .  ... 
arXiv:2012.03094v1 fatcat:ynp2g3ng6rbe3jsmqopz6rrwu4

Robust Temporal Ensembling for Learning with Noisy Labels [article]

Abel Brown, Benedikt Schifferer, Robert DiPietro
2021 arXiv   pre-print
Finally, we show that RTE also retains competitive corruption robustness to unforeseen input noise using CIFAR-10-C, obtaining a mean corruption error (mCE) of 13.50% even in the presence of an 80% noise  ...  simple asynchronous optimisation algorithm that jointly optimize a population of models and their hyperparameters.  ...  In this way, the ECR term can loosely be seen as generating a set of stochastic differential constraints at each optimization step of the classification task loss.  ... 
arXiv:2109.14563v1 fatcat:7xyyzhv3abgcrn6nj7xpiehdbq
« Previous Showing results 1 — 15 out of 235 results