Filters








49,187 Hits in 3.2 sec

Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming [chapter]

Richard S. Sutton
1990 Machine Learning Proceedings 1990  
Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world.  ...  Planning is incremental and can use the probabilistic and ofttimes incorrect world models generated by learning processes.  ...  Reinforcement learning with hypothetical experience is in fact an incremental form of planning that is closely related to dynamic programming.  ... 
doi:10.1016/b978-1-55860-141-3.50030-4 dblp:conf/icml/Sutton90 fatcat:zsazneymvfbpbdiec6walpjcvu

Dyna, an integrated architecture for learning, planning, and reacting

Richard S. Sutton
1991 ACM SIGART Bulletin  
Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world.  ...  Planning is incremental and can use the probabilistic and ofttimes incorrect world models generated by learning processes.  ...  Reinforcement learning with hypothetical experience is in fact an incremental form of planning that is closely related to dynamic programming.  ... 
doi:10.1145/122344.122377 fatcat:edbzbh4mzzappnntpqrr4bot3m

Towards a Hands-Free Query Optimizer through Deep Learning [article]

Ryan Marcus, Olga Papaemmanouil
2018 arXiv   pre-print
In this vision paper, we argue that a new type of query optimizer, based on deep reinforcement learning, can drastically improve on the state-of-the-art.  ...  learning-based query optimizers.  ...  Similar incremental approaches has shown success in other applications of reinforcement learning [6, 9, 33] .  ... 
arXiv:1809.10212v2 fatcat:k4srbxjwy5fd7nd7nqdcwpnoci

Learning State Representations for Query Optimization with Deep Reinforcement Learning [article]

Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi
2018 arXiv   pre-print
These models are able to capture a high level understanding of their environment, enabling them to learn difficult dynamic tasks in a variety of domains.  ...  In the database field, query optimization remains a difficult problem. Our goal in this work is to explore the capabilities of deep reinforcement learning in the context of query optimization.  ...  Acknowledgements This project was supported in part by NSF grants IIS-1247469 and Teradata.  ... 
arXiv:1803.08604v1 fatcat:h2cytvx55re5hfzkdrg2l4gfoq

Bridging the Gap between Reinforcement Learning and Knowledge Representation: A Logical Off- and On-Policy Framework [article]

Emad Saad
2010 arXiv   pre-print
Moreover, we show that any model-free reinforcement learning problem in MDP environment can be encoded as a SAT problem. The importance of that is model-free reinforcement  ...  We prove the correctness of our approach. We show that the complexity of finding an offline and online policy for a model-free reinforcement learning problem in our approach is NP-complete.  ...  Unlike the logical model-based reinforcement learning frame-work of [19] that uses normal hybrid probabilistic logic programs to encode model-based reinforcement learning problems, normal logic program  ... 
arXiv:1012.1552v1 fatcat:pqgnvdzv55gkxa6lyednmz57hu

Should I do that? using relational reinforcement learning and declarative programming to discover domain axioms

Mohan Sridharan, Ben Meadows
2016 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)  
For any given goal, unexplained failure of plans created by ASP-based inference is taken to indicate the existence of unknown domain axioms.  ...  The task of discovering these axioms is formulated as a reinforcement learning problem, and a relational representation is used to incrementally generalize from specific axioms identified over time.  ...  This work was supported in part by the US Office of Naval Research Science of Autonomy award N00014-13-1-0766. All opinions and conclusions expressed in this paper are those of the authors.  ... 
doi:10.1109/devlrn.2016.7846827 dblp:conf/icdl-epirob/SridharanM16 fatcat:b45ucz75cnhw5hitunspnlc5bm

Learning the structure of Factored Markov Decision Processes in reinforcement learning problems

Thomas Degris, Olivier Sigaud, Pierre-Henri Wuillemin
2006 Proceedings of the 23rd international conference on Machine learning - ICML '06  
planning algorithms based on fmdps with supervised learning techniques building structured representations of the problem.  ...  In this paper, we propose sdyna, a general framework for addressing large reinforcement learning problems by trial-and-error and with no initial knowledge of their structure. sdyna integrates incremental  ...  Indirect (or model-based ) rl algorithms build incrementally a model of the transition and reward functions.  ... 
doi:10.1145/1143844.1143877 dblp:conf/icml/DegrisSW06 fatcat:mfvy3wbddjfmnkj2qh64u3qzcm

Integrating Reinforcement Learning and Declarative Programming to Learn Causal Laws in Dynamic Domains [chapter]

Mohan Sridharan, Sarah Rainge
2014 Lecture Notes in Computer Science  
The learned rules are, in turn, encoded in the ASP program and used to plan action sequences for subsequent tasks.  ...  This paper presents an architecture that combines the complementary strengths of Reinforcement Learning (RL) and declarative programming to support such commonsense reasoning and incremental learning of  ...  Office of Naval Research (ONR) Science of Autonomy Award N00014-13-1-0766. Opinions and conclusions are those of the authors and do not necessarily reflect the views of the ONR.  ... 
doi:10.1007/978-3-319-11973-1_33 fatcat:ezgru6dpmzf7fabsv3oj5qbcl4

Can I Do That? Discovering Domain Axioms Using Declarative Programming and Relational Reinforcement Learning [chapter]

Mohan Sridharan, Prashanth Devarakonda, Rashmica Gupta
2016 Lecture Notes in Computer Science  
For any given goal, unexplained failure of plans created by inference in the ASP program is taken to indicate the existence of unknown domain axioms.  ...  The task of learning these axioms is formulated as a Reinforcement Learning problem, and decision-tree regression with a relational representation is used to generalize from specific axioms identified  ...  All opinions and conclusions in this paper are those of the authors alone.  ... 
doi:10.1007/978-3-319-46840-2_3 fatcat:np6cr6wizvbbbfnqiylkyxc3ci

Sequence learning: from recognition and prediction to sequential decision making

R. Sun, C.L. Giles
2001 IEEE Intelligent Systems  
and neural networks for implementing reinforcement learning 24 or combining symbolic planning and reinforcement learning to produce action sequences. 25, 26 Researchers have also proposed combining  ...  So, it's logical that sequence learning is an important component of learning in many task domains of intelligent systems: inference, planning, reasoning, robotics, natural language processing, speech  ... 
doi:10.1109/mis.2001.1463065 fatcat:safqkf2ovnanteqrgfueamn7mi

Reinforcement learning is direct adaptive optimal control

1992 IEEE Control Systems  
Reinforcement learning is one of the major neural-network approaches to learning control. How should it be viewed from a control systems perspective?  ...  For concreteness, we focus on one reinforcement learning method (Q-learning) and on its analytically proven capabilities for one class of adaptive optimal control problems (Markov decision problems with  ...  Reinforcement Learning Reinforcement leaming is based on the common sense idea that if an action is followed by a satisfactory state of affairs, or by an improvement in the state of affairs (as determined  ... 
doi:10.1109/37.126844 fatcat:oiskjm5lxnbmrbn7vucqpwidka

Trajectory Simulation Approach for Autonomous Vehicles Path Planning using Deep Reinforcement Learning

Jean Phelipe De Oliveira Lima, Raimundo Correa de Oliveira, Cleinaldo de Almeida Costa
2020 International journal for innovation education and research  
This work presents the use of Deep Reinforcement Learning for the task of path planning for autonomous vehicles through trajectory simulation, to define routes that offer greater safety (without collisions  ...  Such evaluation occurred in two phases: isolated evaluation, in which the models were inserted into the environment without prior knowledge; and incremental evaluation, in which models were inserted in  ...  For this activity, artificial intelligence techniques, especially reinforcement learning, have been applied, as is the case in [21] , in which models based on two Deep Reinforcement Learning techniques  ... 
doi:10.31686/ijier.vol8.iss12.2837 fatcat:3crcvhxurnc6poewh5zoya23vi

Learning by Knowledge Sharing in Autonomous Intelligent Systems [chapter]

Ramón García-Martínez, Daniel Borrajo, Pablo Maceri, Paola Britos
2006 Lecture Notes in Computer Science  
Operators are generated incrementally by combining rote learning, induction, and a variant of reinforcement learning.  ...  In order to achieve better learning convergence, several agents that learn separately are allowed to interchange each learned set of planning operators.  ...  Learning planning operators (what we will call operators, is also referred to as action models within the reinforcement learning community) is achieved by observing the consequences of executing planned  ... 
doi:10.1007/11874850_17 fatcat:mcszqyzzmbezporn3oqazox44m

How to recommend preferable solutions of a user in interactive reinforcement learning ?

Tomohiro Yamaguchi, Takuma Nishimura
2008 2008 SICE Annual Conference  
Yamaguchi, 2006) block that is extended model-based reinforcement learning.  ...  The number of collected every-visit-optimal policies Preprocess for Modified-PIA Modified-PIA (Puterman, 2006) is one of the model-based reinforcement learning methods based on PIA modified for the  ...  How to Recommend Preferable Solutions of a User in Interactive Reinforcement Learning?, Advances in Reinforcement Learning, Prof.  ... 
doi:10.1109/sice.2008.4654999 fatcat:vc3xonkqbbgarmpps3hkrpvgim

Integrating hippocampus and striatum in decision-making

Adam Johnson, Matthijs AA van der Meer, A David Redish
2007 Current Opinion in Neurobiology  
Learning and memory and navigation literatures emphasize interactions between multiple memory systems: a flexible, planning-based system and a rigid, cached-value system.  ...  Evaluation of that prediction and subsequent action-selection probably occurs downstream (e.g. in orbitofrontal cortex, in ventral and dorsomedial striatum).  ...  We thank John Ferguson for helpful discussions, for comments on a draft of the manuscript, and for collecting the data used in the supplemental movie.  ... 
doi:10.1016/j.conb.2008.01.003 pmid:18313289 pmcid:PMC3774291 fatcat:rkeiphcq2jbhxerccr7exh23by
« Previous Showing results 1 — 15 out of 49,187 results