Filters








979 Hits in 4.4 sec

Graphical models for interactive POMDPs: representations and solutions

Prashant Doshi, Yifeng Zeng, Qiongyu Chen
2008 Autonomous Agents and Multi-Agent Systems  
We develop new graphical representations for the problem of sequential decision making in partially observable multiagent environments, as formalized by interactive partially observable Markov decision  ...  I-DIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that I-POMDPs generalize POMDPs.  ...  Acknowledgment Prashant Doshi was supported in part by a grant #FA9550-08-1-0429 from the US Air Force Off ce of Scientif c Research (AFOSR) and in part by a grant from the  ... 
doi:10.1007/s10458-008-9064-7 fatcat:s4cysjljpfbs7fqeuh5luci4km

Graphical models for online solutions to interactive POMDPs

Prashant Doshi, Yifeng Zeng, Qiongyu Chen
2007 Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems - AAMAS '07  
We develop a new graphical representation for interactive partially observable Markov decision processes (I-POMDPs) that is significantly more transparent and semantically clear than the previous representation  ...  I-DIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that I-POMDPs generalize POMDPs.  ...  Acknowledgment: We thank Piotr Gmytrasiewicz for some useful discussions related to this work. Prashant Doshi acknowledges the support of a UGARF grant.  ... 
doi:10.1145/1329125.1329387 dblp:conf/atal/DoshiZC07 fatcat:havbf62gbfgpdkseu75fjbs5li

Penetration Testing == POMDP Solving? [article]

Carlos Sarraute Core Security Technologies, ,
2013 arXiv   pre-print
Herein, we model that problem in terms of partially observable Markov decision processes (POMDP).  ...  POMDPs allow to model information gathering as an integral part of the problem, thus providing for the first time a means to intelligently mix scanning actions with actual exploits.  ...  POMDP Model Generation Generating a POMDP model for pentesting requires knowledge about possible states, actions, and observations, plus the reward function and the initial belief state.  ... 
arXiv:1306.4714v1 fatcat:qgkyaswfefcqnew7z6r3eyo2vq

Robust Asymmetric Learning in POMDPs [article]

Andrew Warrington and J. Wilder Lavington and Adam Ścibior and Mark Schmidt and Frank Wood
2021 arXiv   pre-print
Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and so may encourage actions that are sub-optimal, even  ...  jointly trains the expert and the agent.  ...  Figure 2 : 2 Graphical models of an MDP (top) and a POMDP (bottom) with identical initial and state transition dynamics, p(s t |s t−1 , a t ), p(s 0 ), and reward function R(s t , a t , s t+1 ).  ... 
arXiv:2012.15566v3 fatcat:etbg3phqnvgdtfcm2ctbawhane

Scalable Planning and Learning for Multiagent POMDPs

Christopher Amato, Frans Oliehoek
2015 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces.  ...  Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems.  ...  Furthermore, current factored Dec-POMDP and ND-POMDP models generate solutions given the model in an offline fashion, while we consider online methods using a simulator in this paper.  ... 
doi:10.1609/aaai.v29i1.9439 fatcat:cbabc6a4u5exhhai2irdipuszy

Scalable Planning and Learning for Multiagent POMDPs: Extended Version [article]

Christopher Amato, Frans A. Oliehoek
2014 arXiv   pre-print
Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces.  ...  Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems.  ...  Furthermore, current factored Dec-POMDP and ND-POMDP models generate solutions given the model in an offline fashion, while we consider online methods using a simulator in this paper.  ... 
arXiv:1404.1140v2 fatcat:rt5w7oxourd4xbx6h4ngv27u44

Framing Human-Robot Task Communication as a POMDP [article]

Mark P. Woodward, Robert J. Wood
2012 arXiv   pre-print
We work through an example representation of task communication as a POMDP, and present results from a user experiment on an interactive virtual robot, compared with a human controlled virtual robot, for  ...  The results suggest that the proposed POMDP representation produces robots that are robust to teacher error, that can accurately infer task details, and that are perceived to be intelligent.  ...  In order to generate these uncertainty reducing actions we feel that a representation allowing for hidden state is needed, and we propose the POMDP.  ... 
arXiv:1204.0280v1 fatcat:lfxvykqbvng7bkgvhc3q3qqzyq

Approximate Planning in POMDPs with Macro-Actions

Georgios Theocharous, Leslie Pack Kaelbling
2003 Neural Information Processing Systems  
Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction.  ...  We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster  ...  A well defined framework for this interaction is the partially observable Markov decision process (POMDP) model.  ... 
dblp:conf/nips/TheocharousK03 fatcat:k5cdza2pvndjhkppw5nev2oduu

Bayesian Reinforcement Learning in Factored POMDPs [article]

Sammie Katt, Frans Oliehoek, Christopher Amato
2018 arXiv   pre-print
We also present a belief tracking method to approximate the joint posterior over state and model variables, and an adaptation of the Monte-Carlo Tree Search solution method, which together are capable  ...  This work introduces the Factored Bayes-Adaptive POMDP model, a framework that is able to exploit the underlying structure while learning the dynamics in partially observable systems.  ...  This section is divided into an introduction to the POMDP and BA-POMDP, typical solution methods for those models, and factored models.  ... 
arXiv:1811.05612v1 fatcat:fbjwunljqvfebplewz4qqfuuay

POMDP-Based Statistical Spoken Dialog Systems: A Review

Steve Young, Milica Gasic, Blaise Thomson, Jason D. Williams
2013 Proceedings of the IEEE  
However, exact model representation and optimization is computationally intractable.  ...  By including an explicit Bayesian model of uncertainty and by optimizing the policy via a reward-driven process, partially observable Markov decision processes (POMDPs) provide such a framework.  ...  While exact representations of a POMDP dialog policy are possible, for example, by compressing belief space [35] or dynamically reassigning states [36] , exact representations are all intractable for  ... 
doi:10.1109/jproc.2012.2225812 fatcat:x5ohjro725ejlclxvptwfcxc7e

Predictive representations for policy gradient in POMDPs

Abdeslam Boularias, Brahim Chaib-draa
2009 Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09  
We compare PSR policies to Finite-State Controllers (FSCs), which are considered as a standard model for policy gradient methods in POMDPs.  ...  We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive State Representations  ...  Background We review the POMDP, PSR and FSC models, and show how PSRs can be adapted to represent policies.  ... 
doi:10.1145/1553374.1553383 dblp:conf/icml/BoulariasC09 fatcat:vzzyomydzfabjmzktulnsin5v4

Anytime Planning for Decentralized POMDPs using Expectation Maximization [article]

Akshat Kumar, Shlomo Zilberstein
2012 arXiv   pre-print
An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous  ...  Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making.  ...  Acknowledgments Support for this work was provided in part by the National Science Foundation Grant IIS-0812149 and by the Air Force Office of Scientific Research Grant FA9550-08-1-0181.  ... 
arXiv:1203.3490v1 fatcat:sjwioezh3jfzdhmmdgyeeghsdi

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods [article]

Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar
2017 arXiv   pre-print
While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with  ...  We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods.  ...  of the POMDP and an optimistic approach for the solution of the explorationexploitation problem.  ... 
arXiv:1705.02553v1 fatcat:xfqurbxubjaprouc267yyjhdki

Representations and solutions for game-theoretic problems

Daphne Koller, Avi Pfeffer
1997 Artificial Intelligence  
This paper describes the Gala system, an implemented system that allows the specification and efficient solution of large imperfect information games.  ...  The system also provides a new declarative language for compactly and naturally representing games by their rules.  ...  Acknowledgements We are deeply grateful to Richard McKelvey and Ted Turocy for going out of their way to ensure that the GAMBIT functionality we needed for our experiments was ready  ... 
doi:10.1016/s0004-3702(97)00023-4 fatcat:5lnjlzovmrb5povybkz23bmlnq

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Jilles Steeve Dibangoye, Christopher Amato, Olivier Buffet, François Charpillet
2016 The Journal of Artificial Intelligence Research  
This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time.  ...  Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally (  ...  These results show that ǫ-optimal solutions can be found for larger horizons in all problems and for horizons that are sometimes an order of magnitude larger than those that have previously been solved  ... 
doi:10.1613/jair.4623 fatcat:bha4xomrwjbphbutotnrcyciqa
« Previous Showing results 1 — 15 out of 979 results