A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
For partially observable Markov decision processes (POMDPs), optimal memoryless policies are generally stochastic. ... It is well known that for any finite state Markov decision process (MDP) there is a memoryless deterministic policy that maximizes the expected reward. ... Partially observable Markov decision processes A discrete time partially observable Markov decision process (POMDP) is defined by a tuple (W, S, A, α, β, R), where W is a finite set of world states, S ...arXiv:1503.07206v2 fatcat:7e7s74p5mrbkhm7rjstfcaig6q
A. 86m:90180 Sufficient statistics in a game-theoretic problem of the control of a partially observable linear diffusion process. (Russian. ... Zijm, Henk 86m:90174 The optimality equations in multichain denumerable state Markov decision processes with the average cost criterion: the bounded cost case. Statist. ...
Two control problems are considered for a partially observed Markov chain with countably infinite states. One is an infinite horizon discounted cost problem. ... Onésimo Hernandez Lerma (Mexico City) 92e:90104 90C40 Borkar, Vivek S. (6-IIS-EE) A remark on control of partially observed Markov chains. Ann. Oper. Res. 29 (1991), no. 1-4, 429-438. ...
This paper expresses the partially observable problem as a k-order Markov Decision Process (MDP) and solves it using Reinforcement Learning. ... Since local force feedback information usually does not completely determine system state, the control problem is partially observable. ... Section III poses grasp synthesis as an optimal control problem and solves it as a k-order Markov Decision Process. ...doi:10.1109/ichr.2007.4813848 dblp:conf/humanoids/Platt07 fatcat:jad6ggz5fne2ljtnr5ngxggypi
Mathematical techniques of optimization, control and decision, pp. 131-149, Birkhauser, Boston, Mass., 1981. ... They po that such a game with a discount factor has optimal value function and both players have optimal stationary strategies. ...
Next, we formulate a single-unit replacement problem as a Markov decision process and utilize the realtime signal observations to determine a replacement policy. ... We focus on exponentially increasing degradation signals and show that the optimal replacement policy for this class of problems is a monotonically nondecreasing control limit policy. ... Kharoufeh from the University of Pittsburgh for their extensive feedback and helpful insights, which helped in strengthening this paper and aiding in its publication. ...doi:10.1287/opre.1110.0912 fatcat:q64stckqtraj7bumhssvh26q2a
This leader-follower assumption allows the POMG to be transformed into a specially structured, partially observed Markov decision process (POMDP). ... The problem is described by an infinite horizon, partially observed Markov game (POMG). ... This assumption allows the POMG to be converted into a partially observed Markov decision process (POMDP). ...arXiv:1404.4388v1 fatcat:p5d6v6627vca3kplycwufrxpri
avoidance constraints and searching for stationary and mobile targets. ... The mobile sensor network consists of a set of robotic sensors modeled as hybrid systems with processing capabilities. ... ACKNOWLEDGMENTS This work was supported by NSF ECCS grant #1027775, and by the Department of Energy URPR Grant #DE-FG52-04NA25590. ...doi:10.1109/cdc.2011.6161127 dblp:conf/cdc/FerrariAFL11 fatcat:ycaulpimpfhvho5qgxipybjpjq
Unsignalized intersection control is one of the most critical issues in intelligent transportation systems, which requires connected and automated vehicles to support more frequent information interaction ... with the scenario of unsignalized intersection control. ... Cooperative Multiagent Deep Deterministic Policy Gradient In this paper, partially observable Markov games are considered, constituting a multiagent Markov decision process. e possible state S, a set of ...doi:10.1155/2020/1820527 fatcat:opkcxvn5vbhbfciytszlfmz7iy
The goal of this paper is to extend the reach of this rich and rapidly developing theory to Markov decision processes and Multiarmed bandits problems, and use this framework to solve the optimal policy ... We generalize and build on the PAC Learning framework for Markov Decision Processes developed in Jain and Varaiya (2006) . We consider the reward function to depend on both the state and the action. ... We propose an empirical process theory approach to simulation-based optimization of Markov decision processes. ...doi:10.1016/j.automatica.2010.05.021 fatcat:liylwmxl3ngijcfnanpdntcjai
The paper develops application of techniques from robust and universal hypothesis testing for anomaly detection and change-point detection in dynamic, interconnected systems. ... This theory is extended using the concept of projected Markov models originally proposed by Claude Shannon. ... Multiple Models and Partial Information 1) Partial and Distributed Information: Suppose that we observe only a few function of the process Z. ...doi:10.1109/cdc.2009.5400612 dblp:conf/cdc/MeynSLN09 fatcat:kkemnuftang47jnisyuirirwkq
Filrrring ami stochastic control Optimal control of Markov Processes Arie Hordijk, Unicarsity of Leidcn, The Netherlands Firstly we consider Markov decision chains with a denumerable state space and ... Problems of this type often arise in connection with Markov decision processes. ...doi:10.1016/0304-4149(84)90173-x fatcat:wanpovevh5bltlkl7ngbxbi2ga
The targets are modeled by a Markov motion process that is commonly used in target tracking applications. ... Since the sensors are installed on mobile robots and have limited range, the geometry of their platforms and fields-of-view play a critical role in motion planning and obstacle avoidance. ... ACKNOWLEDGMENTS This work is supported in part by the Office of Naval Research (Code 321), and by NSF grant ECS CAREER #0448906. The work of R. ...doi:10.1109/cdc.2009.5400166 dblp:conf/cdc/FerrariFT09 fatcat:jrmx4atf4zeozb7b7gt6xxnoku
For grasping and manipulation, we propose a closed-loop control process that is parametric in the number and identity of contact resources. ... A grasp controller can thus be tuned on-line to optimize performance over a variety of object geometries. ... Acknowledgements This work was supported in part by the National Science Foundation under grants CISE /CDA-9703217, IRI-9704530 and IRI-9503687. ...doi:10.1163/15685530260182927 fatcat:bizzxnp43jculbnp3zaoznuu2m
This paper presents a novel framework for studying partially observable Markov decision processes (POMDPs) with finite state, action, observation sets, and discounted rewards. ... It reveals the connection between the POMDP problem and two computational geometry problems, i.e., finding the vertices of a convex hull and finding the Minkowski sum of convex polytopes, which can help ... Acknowledgments The author thanks the associate editor and two anonymous referees for their constructive suggestions that improved the exposition of this paper, and Mahesh Nagarajan for his helpful comments ...doi:10.1287/opre.1090.0697 fatcat:rmzqivhlhbg55e3euejxmsqlti
« Previous Showing results 1 — 15 out of 2,912 results