A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Graphical models for interactive POMDPs: representations and solutions
2008
Autonomous Agents and Multi-Agent Systems
We develop new graphical representations for the problem of sequential decision making in partially observable multiagent environments, as formalized by interactive partially observable Markov decision ...
I-DIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that I-POMDPs generalize POMDPs. ...
Acknowledgment Prashant Doshi was supported in part by a grant #FA9550-08-1-0429 from the US Air Force Off ce of Scientif c Research (AFOSR) and in part by a grant from the ...
doi:10.1007/s10458-008-9064-7
fatcat:s4cysjljpfbs7fqeuh5luci4km
Graphical models for online solutions to interactive POMDPs
2007
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems - AAMAS '07
We develop a new graphical representation for interactive partially observable Markov decision processes (I-POMDPs) that is significantly more transparent and semantically clear than the previous representation ...
I-DIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that I-POMDPs generalize POMDPs. ...
Acknowledgment: We thank Piotr Gmytrasiewicz for some useful discussions related to this work. Prashant Doshi acknowledges the support of a UGARF grant. ...
doi:10.1145/1329125.1329387
dblp:conf/atal/DoshiZC07
fatcat:havbf62gbfgpdkseu75fjbs5li
Penetration Testing == POMDP Solving?
[article]
2013
arXiv
pre-print
Herein, we model that problem in terms of partially observable Markov decision processes (POMDP). ...
POMDPs allow to model information gathering as an integral part of the problem, thus providing for the first time a means to intelligently mix scanning actions with actual exploits. ...
POMDP Model Generation Generating a POMDP model for pentesting requires knowledge about possible states, actions, and observations, plus the reward function and the initial belief state. ...
arXiv:1306.4714v1
fatcat:qgkyaswfefcqnew7z6r3eyo2vq
Robust Asymmetric Learning in POMDPs
[article]
2021
arXiv
pre-print
Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and so may encourage actions that are sub-optimal, even ...
jointly trains the expert and the agent. ...
Figure 2 : 2 Graphical models of an MDP (top) and a POMDP (bottom) with identical initial and state transition dynamics, p(s t |s t−1 , a t ), p(s 0 ), and reward function R(s t , a t , s t+1 ). ...
arXiv:2012.15566v3
fatcat:etbg3phqnvgdtfcm2ctbawhane
Scalable Planning and Learning for Multiagent POMDPs
2015
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. ...
Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems. ...
Furthermore, current factored Dec-POMDP and ND-POMDP models generate solutions given the model in an offline fashion, while we consider online methods using a simulator in this paper. ...
doi:10.1609/aaai.v29i1.9439
fatcat:cbabc6a4u5exhhai2irdipuszy
Scalable Planning and Learning for Multiagent POMDPs: Extended Version
[article]
2014
arXiv
pre-print
Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. ...
Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems. ...
Furthermore, current factored Dec-POMDP and ND-POMDP models generate solutions given the model in an offline fashion, while we consider online methods using a simulator in this paper. ...
arXiv:1404.1140v2
fatcat:rt5w7oxourd4xbx6h4ngv27u44
Framing Human-Robot Task Communication as a POMDP
[article]
2012
arXiv
pre-print
We work through an example representation of task communication as a POMDP, and present results from a user experiment on an interactive virtual robot, compared with a human controlled virtual robot, for ...
The results suggest that the proposed POMDP representation produces robots that are robust to teacher error, that can accurately infer task details, and that are perceived to be intelligent. ...
In order to generate these uncertainty reducing actions we feel that a representation allowing for hidden state is needed, and we propose the POMDP. ...
arXiv:1204.0280v1
fatcat:lfxvykqbvng7bkgvhc3q3qqzyq
Approximate Planning in POMDPs with Macro-Actions
2003
Neural Information Processing Systems
Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. ...
We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster ...
A well defined framework for this interaction is the partially observable Markov decision process (POMDP) model. ...
dblp:conf/nips/TheocharousK03
fatcat:k5cdza2pvndjhkppw5nev2oduu
Bayesian Reinforcement Learning in Factored POMDPs
[article]
2018
arXiv
pre-print
We also present a belief tracking method to approximate the joint posterior over state and model variables, and an adaptation of the Monte-Carlo Tree Search solution method, which together are capable ...
This work introduces the Factored Bayes-Adaptive POMDP model, a framework that is able to exploit the underlying structure while learning the dynamics in partially observable systems. ...
This section is divided into an introduction to the POMDP and BA-POMDP, typical solution methods for those models, and factored models. ...
arXiv:1811.05612v1
fatcat:fbjwunljqvfebplewz4qqfuuay
POMDP-Based Statistical Spoken Dialog Systems: A Review
2013
Proceedings of the IEEE
However, exact model representation and optimization is computationally intractable. ...
By including an explicit Bayesian model of uncertainty and by optimizing the policy via a reward-driven process, partially observable Markov decision processes (POMDPs) provide such a framework. ...
While exact representations of a POMDP dialog policy are possible, for example, by compressing belief space [35] or dynamically reassigning states [36] , exact representations are all intractable for ...
doi:10.1109/jproc.2012.2225812
fatcat:x5ohjro725ejlclxvptwfcxc7e
Predictive representations for policy gradient in POMDPs
2009
Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09
We compare PSR policies to Finite-State Controllers (FSCs), which are considered as a standard model for policy gradient methods in POMDPs. ...
We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive State Representations ...
Background We review the POMDP, PSR and FSC models, and show how PSRs can be adapted to represent policies. ...
doi:10.1145/1553374.1553383
dblp:conf/icml/BoulariasC09
fatcat:vzzyomydzfabjmzktulnsin5v4
Anytime Planning for Decentralized POMDPs using Expectation Maximization
[article]
2012
arXiv
pre-print
An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous ...
Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. ...
Acknowledgments Support for this work was provided in part by the National Science Foundation Grant IIS-0812149 and by the Air Force Office of Scientific Research Grant FA9550-08-1-0181. ...
arXiv:1203.3490v1
fatcat:sjwioezh3jfzdhmmdgyeeghsdi
Experimental results : Reinforcement Learning of POMDPs using Spectral Methods
[article]
2017
arXiv
pre-print
While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with ...
We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. ...
of the POMDP and an optimistic approach for the solution of the explorationexploitation problem. ...
arXiv:1705.02553v1
fatcat:xfqurbxubjaprouc267yyjhdki
Representations and solutions for game-theoretic problems
1997
Artificial Intelligence
This paper describes the Gala system, an implemented system that allows the specification and efficient solution of large imperfect information games. ...
The system also provides a new declarative language for compactly and naturally representing games by their rules. ...
Acknowledgements We are deeply grateful to Richard McKelvey and Ted Turocy for going out of their way to ensure that the GAMBIT functionality we needed for our experiments was ready ...
doi:10.1016/s0004-3702(97)00023-4
fatcat:5lnjlzovmrb5povybkz23bmlnq
Optimally Solving Dec-POMDPs as Continuous-State MDPs
2016
The Journal of Artificial Intelligence Research
This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. ...
Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally ( ...
These results show that ǫ-optimal solutions can be found for larger horizons in all problems and for horizons that are sometimes an order of magnitude larger than those that have previously been solved ...
doi:10.1613/jair.4623
fatcat:bha4xomrwjbphbutotnrcyciqa
« Previous
Showing results 1 — 15 out of 979 results