Filters








16,410 Hits in 7.5 sec

Power Management for Multicore Processors via Heterogeneous Voltage Regulation and Machine Learning Enabled Adaptation

Xin Zhan, Jianhao Chen, Edgar Sanchez-Sinencio, Peng Li
2019 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
By exploring the rich heterogeneity and tunability in HVR, we develop systematic workload-aware power management policies to adapt heterogeneous VRs with respect to workload change at multiple temporal  ...  The proposed techniques are further supported by hardware-accelerated machine learning (ML) prediction of nonuniform spatial workload distributions for more accurate HVR adaptation at fine time granularity  ...  With full consideration of the energy interdependency in the regulation chain, the proposed control policy achieves a near-optimal overall power efficiency by carefully trading off power loss at different  ... 
doi:10.1109/tvlsi.2019.2923911 fatcat:bizdu2xy4bf2pjyobzuk7l53c4

An Adiabatic Theorem for Policy Tracking with TD-learning [article]

Neil Walton
2020 arXiv   pre-print
We derive finite-time bounds for tabular temporal difference learning and Q-learning when the policy used for training changes in time.  ...  We evaluate the ability of temporal difference learning to track the reward function of a policy as it changes over time.  ...  Introduction Policy evaluation and, in particular, temporal difference (TD) learning is a key ingredient in reinforcement learning.  ... 
arXiv:2010.12848v2 fatcat:3o3szb6rfnafvchysloysjziie

Deep Reinforcement and InfoMax Learning [article]

Bogdan Mazoure, Remi Tachet des Combes, Thang Doan, Philip Bachman, R Devon Hjelm
2020 arXiv   pre-print
Finally, we augment C51, a strong RL baseline, with our temporal DIM objective and demonstrate improved performance on a continual learning task and on the recently introduced Procgen environment.  ...  We test our approach in several synthetic settings, where it successfully learns representations that are predictive of the future.  ...  The objective shows improvements in a continual learning setting, as well as on average training rewards for a suite of complex video games.  ... 
arXiv:2006.07217v3 fatcat:fxpjpgrsvrgc7fshqzdxwhnacq

Bellman Error Based Feature Generation using Random Projections on Sparse Spaces [article]

Mahdi Milani Fard, Yuri Grinberg, Amir-massoud Farahmand, Joelle Pineau, Doina Precup
2012 arXiv   pre-print
Bellman Error Basis Functions (BEBFs) have been shown to improve the error of policy evaluation with function approximation, with a convergence rate similar to that of value iteration.  ...  We propose a simple, fast and robust algorithm based on random projections to generate BEBFs for sparse feature spaces.  ...  Least-squares temporal difference learning (LSTD) and its derivations [11, 12] are among the methods used to learn a value function based on a finite sample.  ... 
arXiv:1207.5554v3 fatcat:gkeuxgs3avgxfbft6tyfjp5jnm

Exploration-Enhanced POLITEX [article]

Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari, Gellert Weisz
2019 arXiv   pre-print
We study algorithms for average-cost reinforcement learning problems with value function approximation.  ...  Motivated by the rapid growth of interest in developing policies that learn to explore their environment in the lack of rewards (also known as no-reward learning), we replace the previous assumption that  ...  Assumption A2 (Uniformly fast mixing) Any policy π has a unique stationary distribution µ π .  ... 
arXiv:1908.10479v1 fatcat:qrjwg5vusfcyjf3njnentk64ky

Dynamic scheduling for multi-site companies: a decisional approach based on reinforcement multi-agent learning

N. Aissani, A. Bekrar, D. Trentesaux, B. Beldjilali
2011 Journal of Intelligent Manufacturing  
In recent years, most companies have resorted to multi-site or supply-chain organization in order to improve their competitiveness and adapt to existing real conditions.  ...  In this article, a model for adaptive scheduling in multi-site companies is proposed.  ...  and distributed scheduling of supply chains.  ... 
doi:10.1007/s10845-011-0580-y fatcat:c26qzedcnzglzhz42fxhxwxgca

An Online Reinforcement Learning Approach for Dynamic Pricing of Electric Vehicle Charging Stations

Valeh Moghaddam, Amirmehdi Yazdani, Hai Wang, David Parlevliet, Farhad Shahnia
2020 IEEE Access  
The global market share of electric vehicles (EVs) is on the rise, resulting in a rapid increase in their charging demand in both spatial and temporal domains.  ...  To control the EVs charging demands in supporting utility's stability and increasing the total revenue of the charging stations, treated as a multi-agent framework, an online reinforcement learning model  ...  Moreover, the RL-Fast AHC technique enables the system to implement a unified adaptive exponential tracking which can control and filter the updated rewards for different number of EVs charging in different  ... 
doi:10.1109/access.2020.3009419 fatcat:fzpagljp6nh7dji6x3twmyfr6y

Continual Learning In Environments With Polynomial Mixing Times [article]

Matthew Riemer, Sharath Chandra Raparthy, Ignacio Cases, Gopeshh Subbaraj, Maximilian Puelma Touzel, Irina Rish
2021 arXiv   pre-print
The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios.  ...  In particular, we establish that scalable MDPs have mixing times that scale polynomially with the size of the problem.  ...  frameworks such as temporal difference (TD) learning and actor-critic.  ... 
arXiv:2112.07066v1 fatcat:sgf7nkzcsramzh6idj3vongrae

Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual Optimization [article]

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović
2021 arXiv   pre-print
In this paper, we develop a new distributed temporal-difference learning algorithm and quantify its finite-time performance.  ...  Our algorithm combines a distributed stochastic primal-dual method with a homotopy-based approach to adaptively adjust the learning rate in order to minimize the mean-square projected Bellman error by  ...  In spite of their wide-spread use, these algorithms can become unstable and convergence cannot be guaranteed in off-policy learning scenarios [4] , [21] .  ... 
arXiv:1908.02805v4 fatcat:jvvnknj7fjdszfin6bolvw2m6q

Control Regularization for Reduced Variance Reinforcement Learning [article]

Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, Joel W. Burdick
2019 arXiv   pre-print
Dealing with high variance is a significant challenge in model-free reinforcement learning (RL).  ...  We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off.  ...  Acknowledgements This work was funded in part by Raytheon under the Learning to Fly program, and by DARPA under the Physics-Infused AI Program.  ... 
arXiv:1905.05380v1 fatcat:kgds7qewovg3tmgg7xw5acjffq

Terrain-adaptive locomotion skills using deep reinforcement learning

Xue Bin Peng, Glen Berseth, Michiel van de Panne
2016 ACM Transactions on Graphics  
MACE learns more quickly than a single actorcritic approach and results in actor-critic experts that exhibit specialization.  ...  Abstract Reinforcement learning offers a promising methodology for developing skills for simulated characters, but typically requires working with sparse hand-crafted features.  ...  of the sign of their temporal differences.  ... 
doi:10.1145/2897824.2925881 fatcat:b2n5ytpbqzczll2lj5adz7tjjm

Distributed Energy-Aware Diffusion Least Mean Squares: Game-Theoretic Learning

Omid Namvar Gharehshiran, Vikram Krishnamurthy, George Yin
2013 IEEE Journal on Selected Topics in Signal Processing  
The diffusion LMS stochastic approximation is combined with a game-theoretic learning algorithm such that the overall energy-aware diffusion LMS has two timescales: the fast timescale corresponds to the  ...  game-theoretic activation mechanism, whereby nodes distributively learn their optimal activation strategies, whereas the slow timescale corresponds to the diffusion LMS.  ...  In this case, no pre-computed policy is given; nodes learn their activation policies through repeated play and exchanging information with neighbors.  ... 
doi:10.1109/jstsp.2013.2266318 fatcat:yvgmkljw2rauvf3wotvww7duza

Leeway: Addressing Variability in Dead-Block Prediction for Last-Level Caches

Priyank Faldu, Boris Grot
2017 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)  
In response, we introduce a new metric -Live Distance -that uses the stack distance to learn the temporal reuse characteristics of cache blocks, thus enabling a dead block predictor that is robust to variability  ...  We show that existing management policies are limited by the metrics they use to identify dead blocks, leading to low coverage and/or low accuracy in the face of variability.  ...  We thank the PARSA group at EPFL for providing us with disk images of CloudSuite applications and Onur Koberber & Javier Picorel for their help in setting up these images on Flexus.  ... 
doi:10.1109/pact.2017.32 dblp:conf/IEEEpact/FalduG17 fatcat:7nu4squeivg6xe4mwqr2wqngda

Building Document Treatment Chains Using Reinforcement Learning and Intuitive Feedback

Esther Nicart, Bruno Zanuttini, Hugo Gilbert, Bruno Grilheres, Frderic Praca
2016 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)  
expert" chain), though the feedback given for the AIs to learn from was different.  ...  In production, to avoid learning "from scratch", we would capitalise on existing expertise by initialising BIMBO with a policy based on that of an "expert" chain.  ... 
doi:10.1109/ictai.2016.0102 dblp:conf/ictai/NicartZGGP16 fatcat:44anrubenvbxna7gjv5y7ne6zi

Augmenting Max-Weight with Explicit Learning for Wireless Scheduling with Switching Costs [article]

Subhashini Krishnasamy, Akhil P T, Ari Arapostathis, Rajesh Sundaresan, Sanjay Shakkottai
2018 arXiv   pre-print
Instead, we develop a learning and BS activation algorithm with slow temporal dynamics, and a Max-Weight based channel scheduler that has fast temporal dynamics.  ...  We show using convergence of time-inhomogeneous Markov chains, that the co-evolving dynamics of learning, BS activation and queue lengths lead to near optimal average energy costs along with queue stability  ...  POLICY WITH UNKNOWN STATISTICS In the setting where arrival and channel statistics are unknown, our interest is in designing policies that learn the arrival and channel statistics to make rate allocation  ... 
arXiv:1808.01618v1 fatcat:qgcgpedkm5dbjkqwbmva63a7qq
« Previous Showing results 1 — 15 out of 16,410 results