624 Hits in 4.9 sec

Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration [article]

Lukas Schäfer, Filippos Christianos, Josiah P. Hanna, Stefano V. Albrecht
2022 arXiv   pre-print
Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters  ...  In this work, we introduce Decoupled RL (DeRL) as a general framework which trains separate policies for intrinsically-motivated exploration and exploitation.  ...  DECOUPLED REINFORCEMENT LEARNING In this work, we propose to decouple exploration and exploitation into two separate policies to improve sample efficiency and reduce sensitivity to hyperparameters of intrinsic  ... 
arXiv:2107.08966v3 fatcat:2uqxi6z52nfy3anjmhp6ek7gw4

Hierarchical principles of embodied reinforcement learning: A review [article]

Manfred Eppe, Christian Gumbsch, Matthias Kerzel, Phuong D.H. Nguyen, Martin V. Butz, Stefan Wermter
2020 arXiv   pre-print
Among the most promising computational approaches to provide comparable learning-based problem-solving abilities for artificial agents and robots is hierarchical reinforcement learning.  ...  Cognitive Psychology and related disciplines have identified several critical mechanisms that enable intelligent biological agents to learn to solve complex problems.  ...  Intrinsic motivation is a useful method to stabilise reinforcement learning by supplementing sparse external rewards. It is also commonly used to incentivize exploration.  ... 
arXiv:2012.10147v1 fatcat:dfkdehyz2rggtimmlcmtvycpxe

Never Give Up: Learning Directed Exploration Strategies [article]

Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martín Arjovsky, Alexander Pritzel, Andew Bolt, Charles Blundell
2020 arXiv   pre-print
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.  ...  We employ the framework of Universal Value Function Approximators (UVFA) to simultaneously learn many directed exploration policies with the same neural network, with different trade-offs between exploration  ...  We use reinforcement learning to approximate the optimal value function corresponding to several different weightings of intrinsic rewards.  ... 
arXiv:2002.06038v1 fatcat:jlhm6h4afnbyjhzdslyarwp2xy

Multiagent Deep Reinforcement Learning: Challenges and Directions Towards Human-Like Approaches [article]

Annie Wong, Thomas Bäck, Anna V. Kononova, Aske Plaat
2021 arXiv   pre-print
solutions in multiagent reinforcement learning.  ...  This paper surveys the field of multiagent deep reinforcement learning.  ...  The authors have no conflicts of interest to declare that are relevant to the content of this article. Availability of data and material  ... 
arXiv:2106.15691v1 fatcat:7sy6cianq5dh5a7n6clvjdlrxy

Deep Reinforcement Learning [article]

Yuxi Li
2018 arXiv   pre-print
We start with background of artificial intelligence, machine learning, deep learning, and reinforcement learning (RL), with resources.  ...  Then we discuss important mechanisms for RL, including attention and memory, unsupervised learning, hierarchical RL, multi-agent RL, relational RL, and learning to learn.  ...  Lanctot et al. (2017) observe that independent RL, in which each agent learns by interacting with the environment, oblivious to other agents, can overfit the learned policies to other agents' policies  ... 
arXiv:1810.06339v1 fatcat:kp7atz5pdbeqta352e6b3nmuhy

Intelligent Explorations of the String Theory Landscape [article]

Andrei Constantin
2022 arXiv   pre-print
to make a difference.  ...  Recently, the incorporation of artificial intelligence in string theory and certain theoretical advancements have brought to light unexpected solutions to mathematical hurdles that have so far hindered  ...  discussion of which we now turn. 3 Reinforcement Learning and Genetic Algorithms Reinforcement Learning Reinforcement learning (RL) is a machine learning approach in which an artificial intelligence  ... 
arXiv:2204.08073v2 fatcat:tqzcw7vmzzep5eny7kvb6vci6y

The resilient state: new regulatory modes in international approaches to state building?

Jan Pospisil, Florian P. Kühn
2016 Third World Quarterly  
intrinsically linked to this question, like, for example, 'fragile states'.  ...  Causality of interventions and effects is henceforth decoupled, responsibility of external actors and agencies obfuscated.  ... 
doi:10.1080/01436597.2015.1086637 fatcat:5clyzx7x7jgxtanraquw4pncmq

Money as tool, money as drug: The biological psychology of a strong incentive

Stephen E. G. Lea, Paul Webley
2006 Behavioral and Brain Sciences  
Specifically, what could be the biological basis for the extraordinary incentive and reinforcing power of money, which seems to be unique to the human species?  ...  The classic examples of this process are psychoactive drugs, but we argue that the drug concept can also be extended metaphorically to provide an account of money motivation.  ...  Money motivations are learned in the same way that manners are learned.  ... 
doi:10.1017/s0140525x06009046 pmid:16606498 fatcat:w5fvcxlp65gxpjnbixtssmg44u

The Virtue of Governance, the Governance of Virtue

Geoff Moore
2012 Business Ethics Quarterly  
First, we need to understand the internal contradictions of the tradition that has developed of how to 'do' business. Then we need the virtues to be exercised inside practices and institutions.  ...  Even though governance is usually taken to 'crowd out' virtue, this article proposes an approach to governance that 'crowds in' virtue.  ...  This brings us back to the issue of intrinsic versus extrinsic motivation.  ... 
doi:10.5840/beq201222221 fatcat:hws2ev54dndezejqddlw3uvvkm

Coordination and Learning in Wikipedia: Revisiting the Dynamics of Exploitation and Exploration [chapter]

Aleksi Aaltonen, Jannis Kallinikos
2012 Research in the Sociology of Organizations  
Exploiting and Exploring People's Potentials  ...  The evolution of Wikipedia betrays an increasing reliance on policies and guidelines, signalling certain stabilisation in the knowledge making processes underlying the encyclopaedia.  ...  The empirical relationships depicted in this chapter are based on a dataset that covers nine years of contributions, from January 2001 to January 2010.  ... 
doi:10.1108/s0733-558x(2013)0000037010 fatcat:miidb6glpjbkrnmkkbh24mdaou

Reframing PTSD for computational psychiatry with the active inference framework

Adam Linson, Karl Friston
2019 Cognitive Neuropsychiatry  
Recent advances in research on stress and, respectively, on disorders of perception, learning, and behaviour speak to a promising synthesis of current insights from (i) neurobiology, cognitive neuroscience  ...  Methods: Specifically, we apply this synthesis to PTSD.  ...  ., reinforcement learning and optimal control theory, such that it replaces the optimisation of expected reward with the minimisation of expected surprise (where surprise can include negative rewards).  ... 
doi:10.1080/13546805.2019.1665994 pmid:31564212 pmcid:PMC6816477 fatcat:vjqru2ptzbgupnwghbzxfo2anu

Integrating reinforcement learning, equilibrium points, and minimum variance to understand the development of reaching: A computational model

Daniele Caligiore, Domenico Parisi, Gianluca Baldassarre
2014 Psychological review  
This article contributes to overcome this gap by proposing a computational model based on three key hypotheses: (a) trial-anderror learning processes drive the progressive development of reaching; (b)  ...  to increase accuracy.  ...  ICT-IP-231722, project "IM-CLeVeR -Intrinsically Motivated Cumulative Learning Versatile Robots". We thank Stefano Zappacosta for helping with the statistical analyses.  ... 
doi:10.1037/a0037016 pmid:25090425 fatcat:a4xsbizhufaive6qpjb6pyeryi

An Empirical Framework for Objective Testing for P-Consciousness in an Artificial Agent

Colin Hales
2009 Open Artificial Intelligence Journal  
As such it is intrinsically afforded a status of critical dependency demonstrably no different to any other critical dependency in science, making scientific behaviour ideally suited to a self-referential  ...  This document approaches both issues by exploring the idea of using scientific behaviour self-referentially as a benchmark in an objective test for P-consciousness, which is the relevant critical aspect  ...  ACKNOWLEDGEMENTS I am indebted to Associate Professor Russel Standish, Dr. David Grayden, Wanda Ginnane and unknown reviewers for helpful suggestions and comments.  ... 
doi:10.2174/1874061800903010001 fatcat:i7czzurhsfhhdo5vg3ovkk72qq

Towards a Cross-Level Theory of Neural Learning

Anthony J. Bell, Kevin H. Knuth, Ariel Caticha, Julian L. Center, Adom Giffin, Carlos C. Rodríguez
2007 AIP Conference Proceedings  
A mismatch between the resulting spike-learning algorithm and the known physiological processes of synaptic plasticity is then used as a motivation to introduce the rather obvious idea that neurons are  ...  This paper reviews ideas and results from unsupervised learning theory that have given the best explanation yet of how neural firing rates self-organise to code natural images in area V1 of visual cortex  ...  for chances to air these issues The work was funded by the Swartz Foundation and the NSF Science of Learning Center (TDLC).  ... 
doi:10.1063/1.2821301 fatcat:chux4szumjgdlogxwl2osn3fnm

Designer Ecosystems for the Anthropocene—Deliberately Creating Novel Ecosystems in Cultural Landscapes

Jason Alexandra
2022 Sustainability  
While the ideals of preserving wilderness and conserving ecosystems have motivated much conservation effort to date, achieving these ideals may not be feasible under Anthropocene conditions unless communities  ...  The paper also draws on the literature about cultural landscapes, ecological design, agroecology and permaculture to explore options for applying ecological design as a planning and problem-solving framework  ...  While the fourth explores why explicit recognition of the cultural landscapes is needed to guide ecosystem management.  ... 
doi:10.3390/su14073952 fatcat:atllq2w645b35kspb3wr4rn5ve
« Previous Showing results 1 — 15 out of 624 results