Filters








116 Hits in 5.3 sec

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies [article]

Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar
2016 arXiv   pre-print
The open problem is to come up with a method to find an exact or an approximate optimal stochastic memoryless policy for POMDP models.  ...  In general, finding the optimal policy for the POMDP model is computationally intractable and fully non convex, even for the class of memoryless policies.  ...  Therefore, planing is a problem of finding the optimal memoryless policy, under uncertainty, in the class of stochastic memoryless polices.  ... 
arXiv:1608.04996v1 fatcat:nvjfyziw3za45d2fnix36mk2za

On the Computational Complexity of Stochastic Controller Optimization in POMDPs [article]

Nikos Vlassis, Michael L. Littman, David Barber
2012 arXiv   pre-print
Our result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard.  ...  The corresponding decision problem is NP-hard, in PSPACE, and SQRT-SUM-hard, hence placing it in NP would imply breakthroughs in long-standing open problems in computer science.  ...  Acknowledgments The first author would like to thank Constantinos Daskalakis, Michael Tsatsomeros, John Tsitsiklis, and Steve Vavasis for helpful discussions.  ... 
arXiv:1107.3090v2 fatcat:zy6lbraq2zbjdlqm46a7ldyn6m

On the Computational Complexity of Stochastic Controller Optimization in POMDPs

Nikos Vlassis, Michael L. Littman, David Barber
2012 ACM Transactions on Computation Theory  
Our result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard.  ...  The corresponding decision problem is NP-hard, in PSPACE, and sqrt-sumhard, hence placing it in NP would imply breakthroughs in long-standing open problems in computer science.  ...  Acknowledgments We are grateful to Marek Petrik for his feedback and for pointing an error in an earlier version.  ... 
doi:10.1145/2382559.2382563 fatcat:k5noabwolnfjpoj7bc4ww5tgc4

Learning and planning in environments with delayed feedback

Thomas J. Walsh, Ali Nouri, Lihong Li, Michael L. Littman
2008 Autonomous Agents and Multi-Agent Systems  
This work considers the problems of learning and planning in Markovian environments with constant observation and reward delays.  ...  We provide a hardness result for the general planning problem and positive results for several special cases with deterministic or otherwise constrained dynamics.  ...  Acknowledgements This work was supported in part by NSF IIS award 0329153. We thank the First Annual Reinforcement Learning Competition and Adam White.  ... 
doi:10.1007/s10458-008-9056-7 fatcat:vsdsrqig3rhankwfoz5w2ejdry

Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks [article]

Steven Carr, Nils Jansen, Ralf Wimmer, Alexandru C. Serban, Bernd Becker, Ufuk Topcu
2019 arXiv   pre-print
Numerical experiments show that the proposed method elevates the state of the art in POMDP solving by up to three orders of magnitude in terms of solving times and model sizes.  ...  First, we train a recurrent neural network (RNN) to encode POMDP strategies. The RNN accounts for memory-based decisions without the need to expand the full belief space of a POMDP.  ...  While we cannot guarantee optimality, our approach shows results that are often close to the actual optimum with competitive computation times for large problem domains.  ... 
arXiv:1903.08428v2 fatcat:a5n3uyf44fa5xjce66x6fsisaq

Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks

Steven Carr, Nils Jansen, Ralf Wimmer, Alexandru Serban, Bernd Becker, Ufuk Topcu
2019 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence  
Numerical experiments show that the proposed method elevates the state of the art in POMDP solving by up to three orders of magnitude in terms of solving times and model sizes.  ...  First, we train a recurrent neural network (RNN) to encode POMDP strategies. The RNN accounts for memory-based decisions without the need to expand the full belief space of a POMDP.  ...  In order to cope with arbitrary memory in POMDPs, policy gradient methods need some notion of memory.  ... 
doi:10.24963/ijcai.2019/768 dblp:conf/ijcai/Carr0WS0T19 fatcat:cgsiuwgxwbedtex5kver6wlrxe

[Re] Faster Teaching via POMDP Planning

Lukas Brückner, Aurélien Nioche
2020 Zenodo  
We open source our implementation in Python and extend the description of the learner models with explicit formulas for the belief update, as well as an extended description of the planning algorithm,  ...  While the POMDP policies outperform the random baselines overall, a clear advantage over the policy based on maximum information gain cannot be seen.  ...  7 Author contributions AN and LB designed the replication. LB implemented the model, performed the subsequent analysis, and wrote the paper. AN supervised the process.  ... 
doi:10.5281/zenodo.4242943 fatcat:2agp5kvzdvhylbhb4cixeop2b4

Enforcing Almost-Sure Reachability in POMDPs [chapter]

Sebastian Junges, Nils Jansen, Sanjit A. Seshia
2021 Lecture Notes in Computer Science  
In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification.  ...  We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state.  ...  The problem of determining any winning policy can be cast as a strong cyclic planning problem, proposed earlier with decision diagrams [7] .  ... 
doi:10.1007/978-3-030-81688-9_28 fatcat:g2iwyuuxsrcbxkli46wjopib6m

Reinforcement Learning of POMDPs using Spectral Methods [article]

Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar
2016 arXiv   pre-print
At the end of the episode, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model.  ...  We prove an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.  ...  As stated earlier, contextual MDPs are a special class of POMDPs for which memoryless policies are optimal.  ... 
arXiv:1602.07764v2 fatcat:p7fqj553tbb27ayxwffeykmguy

Monte Carlo Bayesian Reinforcement Learning [article]

Yi Wang
2012 arXiv   pre-print
Theoretical and experimental results show that the discrete POMDP approximates the underlying BRL task well with guaranteed performance.  ...  Bayesian reinforcement learning (BRL) encodes prior knowledge of the world in a model and represents uncertainty in model parameters by maintaining a probability distribution over them.  ...  The PEGASUS analysis bounds the number of samples required to find a good policy in a policy class with finite VC-dimension. Our result does not assume such a policy class.  ... 
arXiv:1206.6449v1 fatcat:avicgfxdvbfabmazt2mdonanae

Variational Inference for Data-Efficient Model Learning in POMDPs [article]

Sebastian Tschiatschek, Kai Arulkumaran, Jan Stühmer, Katja Hofmann
2018 arXiv   pre-print
Today, effective planning approaches exist that generate effective strategies given black-box models of a POMDP task. Yet, an open question is how to acquire accurate models for complex domains.  ...  We empirically show that our model leads to effective control strategies when coupled with state-of-the-art planners.  ...  Work on planning in POMDPs was able to progress while abstracting from the problem of model acquisition.  ... 
arXiv:1805.09281v1 fatcat:pqvowa4pdnazbpaszw4yqcvsxy

Information-Theoretic Methods for Planning and Learning in Partially Observable Markov Decision Processes [article]

Roy Fox
2017 arXiv   pre-print
First, we formulate the problem of optimizing the agent under both extrinsic and intrinsic constraints and develop the main tools for solving it.  ...  In this dissertation, we model these constraints as information-rate constraints on communication channels connecting these various internal components of the agent.  ...  RF and NT are supported by the DARPA MSEE Program, the Gatsby Charitable Foundation, the Israel Science Foundation and the Intel ICRI-CI Institute.  ... 
arXiv:1609.07672v2 fatcat:boaf3xokwnb4dlty5cyvf5xrrm

Formal models and algorithms for decentralized decision making under uncertainty

Sven Seuken, Shlomo Zilberstein
2008 Autonomous Agents and Multi-Agent Systems  
A better understanding of these issues will facilitate further progress in the field and help resolve several open problems that we identify.  ...  Rapid progress in recent years has produced a number of different frameworks, complexity results, and planning algorithms.  ...  Any opinions, findings, conclusions or recommendations expressed in this manuscript are those of the authors and do not reflect the views of the US government.  ... 
doi:10.1007/s10458-007-9026-5 fatcat:35cpuzfixvh6fecubllavumsjm

Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings

Ranjit Nair, Milind Tambe, Makoto Yokoo, David V. Pynadath, Stacy Marsella
2003 International Joint Conference on Artificial Intelligence  
The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP).  ...  Yet, despite the growing importance and applications of decentralized POMDP models in the multiagents arena, few algorithms have been developed for efficiently deriving joint policies for these models.  ...  Acknowledgments We thank Piotr Gmytrasiewicz for discussions related to the paper. This research was supported by NSF grant 0208580 and DARPA award no. F30602-98-2-0108.  ... 
dblp:conf/ijcai/NairTYPM03 fatcat:3tj4afiu6fft3lo7ckx6acaa2q

Policy search for multi-robot coordination under uncertainty

Christopher Amato, George Konidaris, Ariel Anders, Gabriel Cruz, Jonathan P How, Leslie P Kaelbling
2016 The international journal of robotics research  
We introduce a principled method for multi-robot coordination based on a generic model (termed a MacDec-POMDP) of multi-robot cooperative planning in the presence of stochasticity, uncertain sensing and  ...  We present a new MacDec-POMDP planning algorithm that searches over policies represented as finite-state controllers, rather than the existing policy tree representation.  ...  Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.  ... 
doi:10.1177/0278364916679611 fatcat:7ppagayikbeivhzzyxdvwjouwu
« Previous Showing results 1 — 15 out of 116 results