9,209 Hits in 3.9 sec

Adaptive Probabilistic Policy Reuse [chapter]

Yann Chevaleyre, Aydano Machado Pamponet
2012 Lecture Notes in Computer Science  
Based on this relation, we design an generic adaptive transfer method, which we evaluate on a grid-world task.  ...  Transfer Learners with Static Transfer Rates In this section, we will present a state-of-the-art transfer learner, namely PPR (Probabilistic Policy Reuse, as well as PPR-decay, a variation on PPR [?]).  ...  (PPR with exponential decay) at = π(st) with proba. ϕ -greedy(π) with proba 1 − ϕ PPR (Probabilistic Policy Reuse) Optimization of the Transfer Rate as a Stochastic Continuum-Armed Bandit Problem Consider  ... 
doi:10.1007/978-3-642-34487-9_73 fatcat:xbmvvvmsffaktcqafm4kbiz4ny

Self-managing and self-organising mobile computing applications

Jose Luis Fernandez-Marquez, Giovanna Di Marzo Serugendo, Graeme Stevenson, Juan Ye, Simon Dobson, Franco Zambonelli
2014 Proceedings of the 29th Annual ACM Symposium on Applied Computing - SAC '14  
Self-organising systems are being developed in an ad-hoc way, without reusing functionalities, thus producing a software di cult to maintain and to reuse in other applications.  ...  Additionally, because of the dynamic and heterogeneous nature of mobile networks, services need to adapt themselves, in order to ensure both functional and non-functional requirements.  ...  Policy 3: In case of adaptive probabilistic approach, the probability of propagating the information at each nodes is adapted based on the number of neighbouring nodes.  ... 
doi:10.1145/2554850.2555042 dblp:conf/sac/Fernandez-MarquezSSYDZ14 fatcat:fvoatekehfg3nfcabbcvlnlfkq

Dynamic and discrete cache insertion policies for managing shared last level caches in large multicores

Aswinkumar Sridharan, André Seznec
2017 Journal of Parallel and Distributed Computing  
ADAPT outperforms prior cache replacement policies.  ...  The insertion policy is altered to achieve this effect. The inserted cache lines are updated to the MRU priority only probabilistically (1/32 times).  ... 
doi:10.1016/j.jpdc.2017.02.004 fatcat:24hmq6ycqnhhxcmyt2hzaxrmpu

Tactile guidance for policy refinement and reuse

Brenna D. Argall, Eric L. Sauser, Aude G. Billard
2010 2010 IEEE 9th International Conference on Development and Learning  
This work presents an approach for policy improvement and adaptation through a tactile interface located on the body of a robot.  ...  We introduce the Tactile Policy Correction (TPC) algorithm, that employs tactile feedback for the refinement of a demonstrated policy, as well as its reuse for the development of other policies.  ...  To assist the policy development process, our work employs two policy adaptation techniques: refinement and reuse.  ... 
doi:10.1109/devlrn.2010.5578872 dblp:conf/icdl/ArgallSB10 fatcat:3zzq74l7drbdtfdnr4fqkc53ry

Neural probabilistic motor primitives for humanoid control [article]

Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, Nicolas Heess
2019 arXiv   pre-print
The trained neural probabilistic motor primitive system can perform one-shot imitation of whole-body humanoid behaviors, robustly mimicking unseen trajectories.  ...  To support the training of our model, we compare two approaches for offline policy cloning, including an experience efficient method which we call linear feedback policy cloning.  ...  Figure 6 : Reuse of neural probabilistic motor primitive modules. z t . The actual action is then given by the motor primitive module p(a t |s t , z t ).  ... 
arXiv:1811.11711v2 fatcat:pulp4gc5vrdvpibthme367cufm

Iterative learning of grasp adaptation through human corrections

Eric L. Sauser, Brenna D. Argall, Giorgio Metta, Aude G. Billard
2012 Robotics and Autonomous Systems  
We demonstrate grasp adaptation in response to changes in contact, and show successful model reuse and improved adaptation with additional rounds of model refinement.  ...  In this work, we introduce an approach for grasp adaptation which learns a statistical model to adapt hand posture solely based on the perceived contact between the object and fingers.  ...  All rights reserved. doi:10.1016/j.robot.2011.08.012 (a) Grasp adaptation to external perturbation. (b) Policy development through refinement and reuse.  ... 
doi:10.1016/j.robot.2011.08.012 fatcat:lgfxld5lsncmhb67xci3wkfmsm

Lifelong transfer learning with an option hierarchy

Majd Hawasly, Subramanian Ramamoorthy
2013 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems  
We use a probabilistic mixture model to describe regions in state space which are common to successful trajectories in different instances.  ...  Then, we extract policy fragments from previously-learnt policies that are specialised to these regions.  ...  We employ a principled probabilistic method to decompose the state space, and relate the learnt abstraction with policy fragments through policy reuse.  ... 
doi:10.1109/iros.2013.6696523 dblp:conf/iros/HawaslyR13 fatcat:qqzgowf7xnetnhkpmmmtt556pe

An Adaptive Markov Model for the Timing Analysis of Probabilistic Caches

Chao Chen, Giovanni Beltrame
2017 ACM Transactions on Design Automation of Electronic Systems  
CONCLUSIONS In this paper, we have demonstrated an adaptive Markov chain based Static Probabilistic Timing Analysis (SPTA) methodology.  ...  it to be optimal when only reuse distance is known.  ... 
doi:10.1145/3123877 fatcat:aeyhhyzdrngbpcyikuhvsnqxwm

Tactile Guidance for Policy Adaptation

Brenna D. Argall, Eric L. Sauser
2010 Foundations and Trends in Robotics  
We introduce the Tactile Policy Correction (TPC) algorithm, that employs tactile feedback for the refinement of a demonstrated policy, as well as its reuse for the development of other policies.  ...  The performance of the demonstrated policy is found to improve with tactile corrections.  ...  Adaptation for Policy Reuse When tactile corrections are provided for the purpose of policy reuse, existing points within the set D are modified.  ... 
doi:10.1561/2300000012 fatcat:ru6ivonpkfc33octumxm6w5ola

Discrete Cache Insertion Policies for Shared Last Level Cache Management on Large Multicores

Aswinkumar Sridharan, Andre Seznec
2016 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)  
In this study, we introduce Adaptive Discrete and deprioritized Application PrioriTization (ADAPT), an LLC management policy addressing the large multi-cores where the LLC associativity degree is smaller  ...  ADAPT builds on the use of the Footprintnumber metric.  ...  ADAPT outperforms prior cache replacement policies.  ... 
doi:10.1109/ipdps.2016.30 dblp:conf/ipps/SridharanS16 fatcat:bq2i7o7kn5be7jkhda424erpwe

Activation and Spreading Sequence for Spreading Activation Policy Selection Method in Transfer Reinforcement Learning

Hitoshi Kono, Ren Katayama, Yusaku Takakuwa, Wen Wen, Tsuyoshi Suzuki
2019 International Journal of Advanced Computer Science and Applications  
Moreover, transfer learning enables reuse of prior policy and is effective for environment adaptability. However, humans determine applicable methods in transfer learning.  ...  For example, a robot can explore for optimal policy with trial and error using reinforcement learning.  ...  Probabilistic Policy Reuse Fernández et al. proposed a policy selection method using probabilities in [15] , [16] .  ... 
doi:10.14569/ijacsa.2019.0101202 fatcat:guvqucm5bbenhej5c4zzpt2x24

Using Aggressor Thread Information to Improve Shared Cache Management for CMPs

Wanli Liu, D. Yeung
2009 2009 18th International Conference on Parallel Architectures and Compilation Techniques  
To see if we can approximate ORACLE-VT, we develop AGGRESSOR-VT, a policy that probabilistically victimizes aggressor threads with strong bias.  ...  Shared cache allocation policies play an important role in determining CMP performance. The simplest policy, LRU, allocates cache implicitly as a consequence of its replacement decisions.  ...  Rather than adapt policies to per-set interference variation, Adaptive Set Pinning (ASP) [18] re-directs references destined to high-interference sets into per-processor caches.  ... 
doi:10.1109/pact.2009.13 dblp:conf/IEEEpact/LiuY09 fatcat:ke4rbiln4zafvm52zvquvctv4q

An Optimal Online Method of Selecting Source Policies for Reinforcement Learning [article]

Siyuan Li, Chongjie Zhang
2017 arXiv   pre-print
This method formulates online source policy selection as a multi-armed bandit problem and augments Q-learning with policy reuse.  ...  In this paper, we develop an optimal online method to select source policies for reinforcement learning.  ...  j (k) = T j (k − 1); 16: end if 17: end for 18: return Q(s, a) π past with random policy π r probabilistically in policy reuse strategy demonstrated by Algorithm 2.  ... 
arXiv:1709.08201v1 fatcat:munhuiwssfcf7iiqekw27lyl54

WARP: Workload Nature Adaptive Replacement Policy

Balaji S, Gautham Shankar R, Arvind Krishna P
2013 International Journal of Computer Applications  
WARP redesigns the replacement policy in the last level cache.  ...  By this method a set of victims are presented over which any replacement policy can be chosen to select a viable victim.  ...  This may happen as a result of the cache lines with a near reuse being retained by the LLC in contrast to others with a distant reuse, when the replacement policy adapts to the sharing nature of the workload  ... 
doi:10.5120/11978-7849 fatcat:7f6yo5p7n5ftferrqekvdusgqm

Quantum Architecture Search via Continual Reinforcement Learning [article]

Esther Ye, Samuel Yen-Chi Chen
2021 arXiv   pre-print
In this paper, we present the Probabilistic Policy Reuse with deep Q-learning (PPR-DQL) framework to tackle this circuit design challenge.  ...  PROBABILISTIC POLICY REUSE WITH DEEP Q-LEARNING FOR QAS To build an efficient RL framework for quantum architecture search problems, we contribute the Probabilistic Policy Reuse with deep Q-learning (PPR-DQL  ...  We showed numerically that the deep Q-learning with probabilistic policy reuse (PPR) lets the RL agent learn a good policy for the construction of a quantum gate sequence in various unseen environments  ... 
arXiv:2112.05779v1 fatcat:a6b2fx5wnfeglmrnqlts5hritu
« Previous Showing results 1 — 15 out of 9,209 results