12,679 Hits in 4.1 sec

Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal [article]

Alekh Agarwal, Sham Kakade, Lin F. Yang
2020 arXiv   pre-print
This work considers the sample and computational complexity of obtaining an ϵ-optimal policy in a discounted Markov Decision Process (MDP), given only access to a generative model.  ...  We ask arguably the most basic and unresolved question in model based planning: is the naive "plug-in" approach, non-asymptotically, minimax optimal in the quality of the policy it finds, given a fixed  ...  We thank Csaba Szepesvari, Kaiqing Zhang, and Mohammad Gheshlaghi Azar for helpful discussions and pointing out typos in the initial version of the paper. S.  ... 
arXiv:1906.03804v3 fatcat:yj6s7237cndppjirz7x6by4wna

Bayesian sparse sampling for on-line reward optimization

Tao Wang, Daniel Lizotte, Michael Bowling, Dale Schuurmans
2005 Proceedings of the 22nd international conference on Machine learning - ICML '05  
We present an efficient "sparse sampling" technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff.  ...  value of perfect information).  ...  Acknowledgments Research supported by the Alberta Ingenuity Centre for Machine Learning, CRC, NSERC, MITACS and CFI.  ... 
doi:10.1145/1102351.1102472 dblp:conf/icml/WangLBS05 fatcat:pwpcdttqufbttklck3qpfxbsua

QueryPOMDP: POMDP-Based Communication in Multiagent Systems [chapter]

Francisco S. Melo, Matthijs T. J. Spaan, Stefan J. Witwicki
2012 Lecture Notes in Computer Science  
Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) provide powerful modeling tools for multiagent decision-making in the face of uncertainty, but solving these models comes at a  ...  In this paper, we focus on the interplay between these concepts, namely how sparse interactions impact the communication needs.  ...  A partially observable Markov decision process (POMDP) is a 1-agent Dec-POMDP and a Markov decision process (MDP) is a 1-agent Dec-MDP.  ... 
doi:10.1007/978-3-642-34799-3_13 fatcat:ebhtc4qcdnhpzjydcnpc37hd2u

Approximate planning and verification for large Markov decision processes

Richard Lassaigne, Sylvain Peyronnet
2014 International Journal on Software Tools for Technology Transfer (STTT)  
We study the planning and verification problems for very large or infinite probabilistic systems, like Markov Decision Processes (MDPs), from a complexity point of view.  ...  More precisely, we deal with the problem of designing an efficient approximation method to compute a near-optimal policy for the planning problem of MDPs and the satisfaction probabilities of interesting  ...  ACKNOWLEDGMENTS The authors would like to thank the anonymous reviewers for their valuable comments and suggestions.  ... 
doi:10.1007/s10009-014-0344-z fatcat:vulxdvstizftllcwwtickb6hri

Approximate planning and verification for large markov decision processes

Richard Lassaigne, Sylvain Peyronnet
2012 Proceedings of the 27th Annual ACM Symposium on Applied Computing - SAC '12  
We study the planning and verification problems for very large or infinite probabilistic systems, like Markov Decision Processes (MDPs), from a complexity point of view.  ...  More precisely, we deal with the problem of designing an efficient approximation method to compute a near-optimal policy for the planning problem of MDPs and the satisfaction probabilities of interesting  ...  ACKNOWLEDGMENTS The authors would like to thank the anonymous reviewers for their valuable comments and suggestions.  ... 
doi:10.1145/2245276.2231984 dblp:conf/sac/LassaigneP12 fatcat:g2feqtk5jfd53k2hgznji4busi

Page 3275 of Psychological Abstracts Vol. 89, Issue 8 [page]

2002 Psychological Abstracts  
(U Pennsylvania, Dept of Computer & Information Science, Philadelphia, PA) A sparse sampling algorithm for near-optimal planning in large Markov decision processes.  ...  —A critical issue for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP.  ... 

Research on Sports Dance Video Recommendation Method Based on Style

Jiangtao Sun, Haiying Tang, Jie Liu
2022 Scientific Programming  
Video recommendation is particularly important for the improvement of teaching quality. A sports dance video recommendation method based on style is proposed.  ...  The factorization machine model is used to combine features and process high-dimensional sparse features, the deep neural network model is adopted as the value function network of the deep Q-learning algorithm  ...  Acknowledgments is study was supported by the 2020 Provincial Teaching Research Project of Hubei "Practice and research on 'online and offline' mixed teaching of sports dance" (No. 2020736).  ... 
doi:10.1155/2022/7089057 fatcat:gort4trw3rcmfjgvblw2p7s4je

Bayesian Reinforcement Learning via Deep, Sparse Sampling [article]

Divya Grover, Debabrota Basu, Christos Dimitrakakis
2020 arXiv   pre-print
We address the problem of Bayesian reinforcement learning using efficient model-based online planning.  ...  We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal policy, with a lower computational  ...  BAMCP takes a Monte-Carlo approach to sparse lookahead in belief-augmented version of Markov decision process.  ... 
arXiv:1902.02661v4 fatcat:bwofz6wmjfamdm7avnwkrzcfm4

An Intelligent Assistant for Power Plants Based on Factored MDPs

A. Reyes, M.T.J. Spaan, L.E. Sucar
2009 2009 15th International Conference on Intelligent System Applications to Power Systems  
This paper introduces AsistO, an intelligent assistant for the decision support based on decision theoretic planning techniques.  ...  We present the formalism of Markov decision processes as the core of the intelligent assistant which uses a factored representation of plant states.  ...  We explained the formalism of factored Markov decision processes and a intutive algorithm to approximate decision models based on training data.  ... 
doi:10.1109/isap.2009.5352822 fatcat:w6fpgvqb4rae5owj2pwafkefhe

Information-Driven Path Planning

Shi Bai, Tixiao Shan, Fanfei Chen, Lantao Liu, Brendan Englot
2021 Current Robotics Reports  
Summary This review started with the fundamental building blocks of informative planning for environment modeling and monitoring, followed by integration with machine learning, emphasizing how machine  ...  Purpose of Review The era of robotics-based environmental monitoring has given rise to many interesting areas of research.  ...  ., decision theoretic planning based on the Markov Decision Process [16, 98] , stochastic optimal control [11, 47] , and stochastic model predictive control [100, 108] have been broadly used to cope  ... 
doi:10.1007/s43154-021-00045-6 fatcat:cfnlvoaacjhptlbdtk6fhjm37m

Survey On Intelligent Data Repository Using Soft Computing

A. Prema, A.Pethalakshmi
2015 Zenodo  
Moreover the ETL hybridization with fuzzy optimization, Markov Decision model, Decision making criteria and Decision Matrix has also been reviewed.  ...  This chapter focuses on the review of the literature for Extraction, Transform and Load with Data Warehouse.  ...  This summary is made on the aspects of modeling and fuzzy optimization, classification and formulation for the fuzzy optimization problems, models and methods [85].  ... 
doi:10.5281/zenodo.32280 fatcat:q2zh46vpofbq5mbojvjtd6zig4

Reinforcement Learning for Precision Oncology

Jan-Niklas Eckardt, Karsten Wendt, Martin Bornhäuser, Jan Moritz Middeke
2021 Cancers  
develop RL-based decision support systems for precision oncology.  ...  The growing complexity of medical data has led to the implementation of machine learning techniques that are vastly applied for risk assessment and outcome prediction using either supervised or unsupervised  ...  - ing Code Avail- ability [45] Development of adaptive fractionation schemes based on mathematical modeling with a Markov decision process Simulated environ- ment of target volumes and  ... 
doi:10.3390/cancers13184624 pmid:34572853 fatcat:psrib4gwbvgkhgmypbwv53aemu

Research and Challenges of Reinforcement Learning in Cyber Defense Decision-Making for Intranet Security

Wenhao Wang, Dingyuanhao Sun, Feng Jiang, Xingguo Chen, Cheng Zhu
2022 Algorithms  
We propose a framework that defines four modules based on the life cycle of threats: pentest, design, response, recovery.  ...  It is urgent to rethink network defense from the perspective of decision-making, and prepare for every possible situation.  ...  The decision-making model has a variety of formal abstractions, in which the most classic one is the Markov Decision Process (MDP) [15] .  ... 
doi:10.3390/a15040134 fatcat:an3gyhnyzve6jj5r74lvqj6eki

Robotic Planning under Uncertainty in Spatiotemporal Environments in Expeditionary Science [article]

Victoria Preston, Genevieve Flaspohler, Anna P. M. Michel, John W. Fisher III, Nicholas Roy
2022 arXiv   pre-print
We formalize expeditionary science as a sequential decision-making problem, modeled using the language of partially-observable Markov decision processes (POMDPs).  ...  they focus on information gathering as opposed to scientific task execution, and they make use of decision-making approaches that scale poorly to large, continuous problems with long planning horizons  ...  Our contributions, including PLUMES [6, 7] , macro-action discovery [8] , the PHUMES model and trajectory optimizer for operational missions, and ongoing work in physically-informed deep kernel learning  ... 
arXiv:2206.01364v1 fatcat:aps5xtj6xjdhlmt2gkav5fx7ee

Partially Observable Markov Decision Processes [chapter]

Thomas Zeugmann, Pascal Poupart, James Kennedy, Xin Jin, Jiawei Han, Lorenza Saitta, Michele Sebag, Jan Peters, J. Andrew Bagnell, Walter Daelemans, Geoffrey I. Webb, Kai Ming Ting (+12 others)
2011 Encyclopedia of Machine Learning  
Decision ProcessMarkov Decision Process (MDP) is defined by <S,A,T,R> • State S : Current description of the world -Markov: the past is irrelevant once we know the state -Navigation example  ...  • Long horizon and macro-actions • Online search POMDP Powerful but Intractable • Partially Observable Markov Decision Process (POMDP) is a very powerful modeling tool • But with great power  ... 
doi:10.1007/978-0-387-30164-8_629 fatcat:hj6hnbjtn5fshpq4jnxpsaj7dq
« Previous Showing results 1 — 15 out of 12,679 results