Filters








1,978 Hits in 10.7 sec

Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis [article]

Keyan Zahedi and Georg Martius and Nihat Ay
2013 arXiv   pre-print
Three different experiments are presented and their results lead to the conclusion that the linear combination of the one-step PI with an external reward function is not generally recommended in an episodic  ...  policy gradient setting.  ...  The approach evaluated in this paper was to linearly combine the PI with and external reward signal in an episodic policy gradient learning.  ... 
arXiv:1309.6989v1 fatcat:7hotimj2qzhi7dqkoxejdx6cwa

Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis

Keyan Zahedi, Georg Martius, Nihat Ay
2013 Frontiers in Psychology  
Three different experiments are presented and their results lead to the conclusion that the linear combination of the one-step PI with an external reward function is not generally recommended in an episodic  ...  policy gradient setting.  ...  Nevertheless, the results of the experiments presented in this work show that the one-step PI should not be combined in this way with an ERF in an episodic policy gradient setting.  ... 
doi:10.3389/fpsyg.2013.00801 pmid:24204351 pmcid:PMC3816314 fatcat:oi5bhb7zqfhirkfnj5qvixsxlq

Unsupervised Predictive Memory in a Goal-Directed Agent [article]

Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley (+11 others)
2018 arXiv   pre-print
We develop a model, the Memory, RL, and Inference Network (MERLIN), in which memory formation is guided by a process of predictive modeling.  ...  An obvious requirement for handling partially observed tasks is access to extensive memory, but we show memory is not enough; it is critical that the right information be stored in the right format.  ...  Acknowledgments We thank David Silver, Larry Abbott, and Charles Blundell for helpful comments on the manuscript; Daan Wierstra, Neil Rabinowitz, Ari Morcos, Nicolas Heess, Alex Graves, Dharshan Kumaran  ... 
arXiv:1803.10760v1 fatcat:frzec2ieuvahnftr7stem4lrma

Optimizing Agent Behavior over Long Time Scales by Transporting Value [article]

Chia-Chun Hung, Timothy Lillicrap, Josh Abramson, Yan Wu, Mehdi Mirza, Federico Carnevale, Arun Ahuja, Greg Wayne
2018 arXiv   pre-print
Humans spend a remarkable fraction of waking life engaged in acts of "mental time travel". We dwell on our actions in the past and experience satisfaction or regret.  ...  This process endows us with a computationally important ability to link actions and consequences across long spans of time, which figures prominently in addressing the problem of long-term temporal credit  ...  deep LSTM "controller" network and an external memory that stores a history of the past; its output combines with the encoded observation to produce a state variable representing information about the  ... 
arXiv:1810.06721v2 fatcat:sos65kc5s5dcnj7q7t4ng6oxge

Optimizing agent behavior over long time scales by transporting value

Chia-Chun Hung, Timothy Lillicrap, Josh Abramson, Yan Wu, Mehdi Mirza, Federico Carnevale, Arun Ahuja, Greg Wayne
2019 Nature Communications  
More than storytelling, these recollections change how we act in the future and endow us with a computationally important ability to link actions and consequences across spans of time, which helps address  ...  Humans prolifically engage in mental time travel. We dwell on past actions and experience satisfaction or regret.  ...  Code availability The simulation environments and a non-distributed working agent implementation will be made available on publication at https://github.com/deepmind/tvt.  ... 
doi:10.1038/s41467-019-13073-w pmid:31745075 pmcid:PMC6864102 fatcat:oezdrzx375arfpq3yuujr3axvu

Learning to Navigate in Complex Environments [article]

Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andrew J. Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, Raia Hadsell
2017 arXiv   pre-print
Learning to navigate in complex environments with dynamic elements is an important milestone in developing AI agents.  ...  In particular we consider jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks.  ...  ACKNOWLEDGEMENTS 9 Under review as a conference paper at ICLR 2017 We would like to thank Alexander Pritzel, Thomas Degris and Joseph Modayil for useful discussions, Charles Beattie, Julian Schrittwieser  ... 
arXiv:1611.03673v3 fatcat:wprf6i5xvrcfbed2ynz2zwjjbi

Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning [article]

Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor
2019 arXiv   pre-print
Although TP could be integrated with multiple algorithms, this paper focuses on Asynchronous Advantage Actor-Critic (A3C) and demonstrating the advantages of A3C-TP.  ...  In this paper, we contribute a novel self-supervised auxiliary task, i.e., Terminal Prediction (TP), estimating temporal closeness to terminal states for episodic tasks.  ...  Preliminaries We start with the standard reinforcement learning setting of an agent interacting in an environment over a discrete number of steps.  ... 
arXiv:1907.10827v1 fatcat:pmmpai7bmfgvfpqb5fmsu3spim

Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning

Muhammad Burhan Hafez, Cornelius Weber, Matthias Kerzel, Stefan Wermter
2019 Paladyn: Journal of Behavioral Robotics  
Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic  ...  Our neural architecture is composed of a critic and an actor network.  ...  We ran Deep ICAC, Deep CACLA, and DDPG on densereward and sparse-reward environments for 10K episodes with a maximum of 10 steps per episode, with the position of the target object varying randomly every  ... 
doi:10.1515/pjbr-2019-0005 fatcat:vngwa4ig7bamnpbchzp43sv7cm

The Applicability of Reinforcement Learning Methods in the Development of Industry 4.0 Applications

Tamás Kegyes, Zoltán Süle, János Abonyi, Murari Andrea
2021 Complexity  
Our article gives a systematic overview of major types of RL methods, their applications at the field of Industry 4.0 solutions, and it provides methodological guidelines to determine the right approach  ...  that can be fitted better to the different problems, and moreover, it can be a point of reference for R&D projects and further researches.  ...  Acknowledgments is work was supported by the TKP2020-NKA-10 project financed under the 2020-4.1.1-TKP2020 ematic Excellence Programme by the National Research, Development and Innovation Fund of Hungary  ... 
doi:10.1155/2021/7179374 fatcat:ztjbtix4rbh53hummt275itvxi

Improving Robot Dual-System Motor Learning with Intrinsically Motivated Meta-Control and Latent-Space Experience Imagination [article]

Muhammad Burhan Hafez, Cornelius Weber, Matthias Kerzel, Stefan Wermter
2020 arXiv   pre-print
However, dual-system approaches fail to consider the reliability of the learned model when it is applied to make multiple-step predictions, resulting in a compounding of prediction errors and performance  ...  In this paper, we present a novel dual-system motor learning approach where a meta-controller arbitrates online between model-based and model-free decisions based on an estimate of the local reliability  ...  Acknowledgments This work was supported by the German Academic Exchange Service (DAAD) funding programme (No. 57214224) with partial support from the German Research Foundation DFG under project CML (TRR  ... 
arXiv:2004.08830v2 fatcat:zk5xb2szdbafbhvpdgbeftbwwu

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions [article]

Amit Kumar Mondal
2020 arXiv   pre-print
My analysis pointed out that most of the models focused on tuning policy values rather than tuning other things in a particular state of reasoning.  ...  Reinforcement learning is one of the core components in designing an artificial intelligent system emphasizing real-time response.  ...  Consider an episode with a state sequence s 1 , ..., s T and a goal g = s 1 , ..., s T , the policy gets a reward of -1 as long as it is not in the target state.  ... 
arXiv:2001.06921v2 fatcat:uwqn4jmginf73ouk3zmm45uozy

A Robust Approach for Continuous Interactive Actor-Critic Algorithms

Cristian Millan-Arias, Bruno Fernandes, Francisco Cruz, Richard Dazeley, Sergio Fernandes
2021 IEEE Access  
In this paper, we propose an approach that addresses interactive reinforcement learning problems in a dynamic environment, where advice provides information on the task and the dynamics of the environment  ...  Thus, an agent learns a policy in a disturbed environment while receiving advice.  ...  In this way, an agent can learn a proper policy independent of any external perturbance or unforeseen change in the environment, while an external trainer provides informative advice.  ... 
doi:10.1109/access.2021.3099071 fatcat:qqp547i2r5ajrhvltfdnkovsne

Magnetic control of tokamak plasmas through deep reinforcement learning

Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de las Casas, Craig Donner, Leslie Fritz (+19 others)
2022 Nature  
We successfully produce and control a diverse set of plasma configurations on the Tokamak à Configuration Variable1,2, including elongated, conventional shapes, as well as advanced configurations, such  ...  In this work, we introduce a previously undescribed architecture for tokamak magnetic controller design that autonomously learns to command the full set of control coils.  ...  Acknowledgements We gratefully acknowledge the work and support of the TCV team (see the author list of Coda et al. 2 ) in enabling these experimental results. We thank C. Wüthrich and Y.  ... 
doi:10.1038/s41586-021-04301-9 pmid:35173339 pmcid:PMC8850200 fatcat:kf5gmj3gmrgpfhbuwtdpfmui3a

Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems [article]

Vinicius G. Goecks
2020 arXiv   pre-print
Results presented in this work show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms and that learning from a combination  ...  Finally, Cycle-of-Learning develops an effective transition between policies learned using human demonstrations and interventions to reinforcement learning.  ...  In this work, we propose the Cycle-of-Learning (CoL) framework that uses an actor-critic architecture with a loss function that combines behavior cloning and 1-step Q-learning losses with an off-policy  ... 
arXiv:2008.13221v1 fatcat:aofoenmwcvckvagbttrkskevty

Episodic memory governs choices: An RNN-based reinforcement learning model for decision-making task

Xiaohan Zhang, Lu Liu, Guodong Long, Jing Jiang, Shenquan Liu
2020 Neural Networks  
The results indicate that salient events stored in the hippocampus could be prioritized to propagate reward information, and thus allow decision-makers to learn a strategy faster.  ...  Furthermore, we conduct behavioral experiments on our framework, trying to explore an open question in neuroscience: which episodic memory in the hippocampus should be selected to ultimately govern future  ...  Acknowledgement This work was supported by the National Natural Science Foundation of China (Grant No.11572127 and No.11172103).  ... 
doi:10.1016/j.neunet.2020.11.003 pmid:33276194 fatcat:qltsix5mcrauvgroxjjopwgjva
« Previous Showing results 1 — 15 out of 1,978 results