Filters








102 Hits in 4.5 sec

Perseus: Randomized Point-based Value Iteration for POMDPs

M. T.J. Spaan, N. Vlassis
2005 The Journal of Artificial Intelligence Research  
We present a randomized point-based value iteration algorithm called Perseus.  ...  Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent's belief space.  ...  Acknowledgments We would like to thank Bruno Scherrer, Geoff Gordon, Pascal Poupart, and the anonymous reviewers for their comments.  ... 
doi:10.1613/jair.1659 fatcat:tij66jweknbltfpjampdk2zkk4

Region enhanced neural Q-learning for solving model-based POMDPs

Marco A. Wiering, Thijs Kooi
2010 The 2010 International Joint Conference on Neural Networks (IJCNN)  
We compare RENQ to Qmdp and Perseus, two state-of-the-art algorithms for approximately solving model-based POMDPs.  ...  In this paper we introduce the RENQ algorithm, a new POMDP algorithm that combines neural networks for estimating Q-values with the construction of a spatial pyramid over the state space.  ...  Perseus is one of such algorithms. D. Perseus Perseus is an approximate point-based value iteration algorithm for solving POMDPs and was introduced by Spaan and Vlassis in 2005 [21] , [22] .  ... 
doi:10.1109/ijcnn.2010.5596811 dblp:conf/ijcnn/WieringK10 fatcat:opzm3vjmcfd3dd2jmwd7qvspcm

Prioritizing Point-Based POMDP Solvers

G. Shani, R.I. Brafman, S.E. Shimony
2008 IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics)  
Recent scaling up of POMDP solvers towards realistic applications is largely due to point-based methods such as PBVI, Perseus, and HSVI, which quickly converge to an approximate solution for medium-sized  ...  We also present a new algorithm, Prioritized Value Iteration (PVI), and show empirically that it outperforms current point-based algorithms.  ...  b ∈Succ(b) dist(B, b ) 5: return B Point Based Value Iteration (PBVI) [7] (Algorithm 1), begins with b 0 , and at each iteration computes an optimal value function for the current belief points set.  ... 
doi:10.1109/tsmcb.2008.928222 pmid:19022729 fatcat:5oygcscr7jcp3etvjb3ctddife

Prioritizing Point-Based POMDP Solvers [chapter]

Guy Shani, Ronen I. Brafman, Solomon E. Shimony
2006 Lecture Notes in Computer Science  
Recent scaling up of POMDP solvers towards realistic applications is largely due to point-based methods such as PBVI, Perseus, and HSVI, which quickly converge to an approximate solution for medium-sized  ...  We also present a new algorithm, Prioritized Value Iteration (PVI), and show empirically that it outperforms current point-based algorithms.  ...  Prioritized Point Based Value Iteration Point based algorithms compute a value function using α vectors by iterating over some finite set of belief points and executing a sequence of backup operations  ... 
doi:10.1007/11871842_38 fatcat:apagesdzyjef5hrqmq4cfwfpdq

Efficient Planning in Large POMDPs through Policy Graph Based Factorized Approximations [chapter]

Joni Pajarinen, Jaakko Peltonen, Ari Hottinen, Mikko A. Uusitalo
2010 Lecture Notes in Computer Science  
, making computation intractable for large POMDPs.  ...  Many current methods store the policy partly through a set of "value vectors" which is updated at each iteration by planning one step further; the size of such vectors follows the size of the state space  ...  Point based value iteration algorithms scale to problems with thousands of states [4] , which can still be insufficient.  ... 
doi:10.1007/978-3-642-15939-8_1 fatcat:45zzkohf6ralzlrp72h5pyxma4

Model-Based Online Learning of POMDPs [chapter]

Guy Shani, Ronen I. Brafman, Solomon E. Shimony
2005 Lecture Notes in Computer Science  
In this paper we present a novel method for learning a POMDP model online, based on McCallums' Utile Suffix Memory (USM), in conjunction with an approximate policy obtained using an incremental POMDP solver  ...  The model-based approach -learning a POMDP model of the world, and computing an optimal policy for the learned model -may generate superior results in the presence of sensor noise, but learning and solving  ...  Acknowledgments Partially supported by the Israeli Ministry of Science Infrastructure grant No. 3-942, by the Lynn and William Frankel Center for Computer Sciences, and by the Paul Ivanier Center for Robotics  ... 
doi:10.1007/11564096_35 fatcat:u4innixskzbh5gylbrmqml3yhu

Robot Planning in Partially Observable Continuous Domains

Josep M. Porta, Matthijs T. J. Spaan, Nikos Vlassis
2005 Robotics: Science and Systems I  
Finally, we demonstrate PERSEUS, our previously proposed randomized point-based value iteration algorithm, in a simple robot planning problem with a continuous domain, where encouraging results are observed  ...  We present a value iteration algorithm for learning to act in Partially Observable Markov Decision Processes (POMDPs) with continuous state spaces.  ...  Zajdel for their contributions to the work reported here, and the four reviewers for their detailed comments. J.M.  ... 
doi:10.15607/rss.2005.i.029 dblp:conf/rss/PortaSV05 fatcat:3biycxhp7zbvvgu763i6wjpqjy

A novel orthogonal NMF-based belief compression for POMDPs

Xin Li, William K. W. Cheung, Jiming Liu, Zhili Wu
2007 Proceedings of the 24th international conference on Machine learning - ICML '07  
in a valuedirected manner so that the value function will take same values for corresponding belief states in the original and compressed state spaces.  ...  In this paper, we propose a novel orthogonal non-negative matrix factorization (O-NMF) for the projection.  ...  Acknowledgments We would like to thank the anonymous reviewers for their useful and insightful comments.  ... 
doi:10.1145/1273496.1273564 dblp:conf/icml/LiCLW07 fatcat:oowib3qzjbde7c6rsjnmk4acy4

Anytime Point-Based Approximations for Large POMDPs

J. Pineau, G. Gordon, S. Thrun
2006 The Journal of Artificial Intelligence Research  
The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI).  ...  A well-known technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex.  ...  We also thank Darius Braziunas, Pascal Poupart, Trey Smith and Nikos Vlassis, for conversations regarding their algorithms and results.  ... 
doi:10.1613/jair.2078 fatcat:gk4phgieufczjkm5zlegm4jj4m

Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs

Christopher Amato, Daniel S. Bernstein, Shlomo Zilberstein
2009 Autonomous Agents and Multi-Agent Systems  
POMDPs and their decentralized multiagent counterparts, DEC-POMDPs, offer a rich framework for sequential decision making under uncertainty.  ...  Our approach is easy to implement and it opens up promising research directions for solving POMDPs and DEC-POMDPs using nonlinear programming methods.  ...  Acknowledgments We would like to thank Marek Petrik for his helpful comments. Support for this work was provided in part by the National Science Foundation under Grant No.  ... 
doi:10.1007/s10458-009-9103-z fatcat:omfbzmsktjfvtospdznj6mqsea

A survey of point-based POMDP solvers

Guy Shani, Joelle Pineau, Robert Kaplow
2012 Autonomous Agents and Multi-Agent Systems  
This approach, known as point-based value iteration, avoids the exponential growth of the value function, and is thus applicable for domains with longer horizons, even with relatively large state spaces  ...  In this survey, we walk the reader through the fundamentals of point-based value iteration, explaining the main concepts and ideas.  ...  Funding for this work was provided by the Natural Sciences and Engineering Research Council of Canada.  ... 
doi:10.1007/s10458-012-9200-2 fatcat:hkmlt3wm65bh3hxuzfktdwyjyu

Evaluating Point-Based POMDP Solvers on Multicore Machines

Guy Shani
2010 IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics)  
Recent scaling up of POMDP solvers towards realistic applications is largely due to point-based methods which quickly provide approximate solutions for mid-sized problems.  ...  In this paper we evaluate several ways in which point-based algorithms can be adapted to parallel computing.  ...  The Perseus algorithm 2 (Algorithm 4) then iterates over these points in a random order.  ... 
doi:10.1109/tsmcb.2009.2034015 pmid:19914897 fatcat:pxgy2ru2lbgvhah4bbra4gainy

On defect propagation in multi-machine stochastically deteriorating systems with incomplete information

Rakshita Agrawal, Matthew J. Realff, Jay H. Lee
2012 Journal of Process Control  
The resulting (significantly large sized) POMDPs are solved using a point based method called PERSEUS, and the results are compared with those obtained by conventionally used periodic policies.  ...  The resulting maintenance and inspection problem is extensively studied for a single machine system by using the framework of Partially Observable Markov Decision Processes (POMDPs).  ...  Similar to the value iteration for MDPs, the value update step for a belief point b is shown in (7) .  ... 
doi:10.1016/j.jprocont.2012.01.018 fatcat:cdzwste6hfbhpnn6dlcckpkh7m

Point-Based Value Iteration for Finite-Horizon POMDPs

Erwin Walraven, Matthijs T. J. Spaan
2019 The Journal of Artificial Intelligence Research  
Since solving POMDPs to optimality is a difficult task, point-based value iteration methods are widely used.  ...  Subsequently, we present a general point-based value iteration algorithm for finite-horizon problems which provides solutions with guarantees on solution quality.  ...  Acknowledgments The research in this paper is funded by the Netherlands Organisation for Scientific Research (NWO), as part of the Uncertainty Reduction in Smart Energy Systems (URSES) program.  ... 
doi:10.1613/jair.1.11324 fatcat:7rld6jsrg5batkv6ihzov3avny

Incremental least squares policy iteration in reinforcement learning for control

Chun-Gui Li, Meng Wang, Shu-Hong Yang
2008 2008 International Conference on Machine Learning and Cybernetics  
As the ILSPI is based on belief sample points, it represents a point-based policy iteration method.  ...  We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon stationary policy for partially observable Markov decision processes (POMDPs).  ...  The resulting algorithm, point-based value iteration (PBVI), has proven to be a practical POMDP solution scaling up to large problems.  ... 
doi:10.1109/icmlc.2008.4620736 fatcat:ir5njdbranh3rlpzvkgs7jg7gq
« Previous Showing results 1 — 15 out of 102 results