2,912 Hits in 5.4 sec

Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes [article]

Guido Montufar, Keyan Ghazi-Zahedi, Nihat Ay
2016 arXiv   pre-print
For partially observable Markov decision processes (POMDPs), optimal memoryless policies are generally stochastic.  ...  It is well known that for any finite state Markov decision process (MDP) there is a memoryless deterministic policy that maximizes the expected reward.  ...  Partially observable Markov decision processes A discrete time partially observable Markov decision process (POMDP) is defined by a tuple (W, S, A, α, β, R), where W is a finite set of world states, S  ... 
arXiv:1503.07206v2 fatcat:7e7s74p5mrbkhm7rjstfcaig6q

Page 5900 of Mathematical Reviews Vol. , Issue 86m [page]

1986 Mathematical Reviews  
A. 86m:90180 Sufficient statistics in a game-theoretic problem of the control of a partially observable linear diffusion process. (Russian.  ...  Zijm, Henk 86m:90174 The optimality equations in multichain denumerable state Markov decision processes with the average cost criterion: the bounded cost case. Statist.  ... 

Page 2935 of Mathematical Reviews Vol. , Issue 92e [page]

1992 Mathematical Reviews  
Two control problems are considered for a partially observed Markov chain with countably infinite states. One is an infinite horizon discounted cost problem.  ...  Onésimo Hernandez Lerma (Mexico City) 92e:90104 90C40 Borkar, Vivek S. (6-IIS-EE) A remark on control of partially observed Markov chains. Ann. Oper. Res. 29 (1991), no. 1-4, 429-438.  ... 

Learning grasp strategies composed of contact relative motions

Robert Platt
2007 2007 7th IEEE-RAS International Conference on Humanoid Robots  
This paper expresses the partially observable problem as a k-order Markov Decision Process (MDP) and solves it using Reinforcement Learning.  ...  Since local force feedback information usually does not completely determine system state, the control problem is partially observable.  ...  Section III poses grasp synthesis as an optimal control problem and solves it as a k-order Markov Decision Process.  ... 
doi:10.1109/ichr.2007.4813848 dblp:conf/humanoids/Platt07 fatcat:jad6ggz5fne2ljtnr5ngxggypi

Page 2247 of Mathematical Reviews Vol. , Issue 83e [page]

1983 Mathematical Reviews  
Mathematical techniques of optimization, control and decision, pp. 131-149, Birkhauser, Boston, Mass., 1981.  ...  They po that such a game with a discount factor has optimal value function and both players have optimal stationary strategies.  ... 

Structured Replacement Policies for Components with Complex Degradation Processes and Dedicated Sensors

Alaa H. Elwany, Nagi Z. Gebraeel, Lisa M. Maillart
2011 Operations Research  
Next, we formulate a single-unit replacement problem as a Markov decision process and utilize the realtime signal observations to determine a replacement policy.  ...  We focus on exponentially increasing degradation signals and show that the optimal replacement policy for this class of problems is a monotonically nondecreasing control limit policy.  ...  Kharoufeh from the University of Pittsburgh for their extensive feedback and helpful insights, which helped in strengthening this paper and aiding in its publication.  ... 
doi:10.1287/opre.1110.0912 fatcat:q64stckqtraj7bumhssvh26q2a

Partially Observed, Multi-objective Markov Games [article]

Yanling Chang, Alan L. Erera, Chelsea C. White III
2014 arXiv   pre-print
This leader-follower assumption allows the POMG to be transformed into a specially structured, partially observed Markov decision process (POMDP).  ...  The problem is described by an infinite horizon, partially observed Markov game (POMG).  ...  This assumption allows the POMG to be converted into a partially observed Markov decision process (POMDP).  ... 
arXiv:1404.4388v1 fatcat:p5d6v6627vca3kplycwufrxpri

Cooperative navigation for heterogeneous autonomous vehicles via approximate dynamic programming

Silvia Ferrari, Michael Anderson, Rafael Fierro, Wenjie Lu
2011 IEEE Conference on Decision and Control and European Control Conference  
avoidance constraints and searching for stationary and mobile targets.  ...  The mobile sensor network consists of a set of robotic sensors modeled as hybrid systems with processing capabilities.  ...  ACKNOWLEDGMENTS This work was supported by NSF ECCS grant #1027775, and by the Department of Energy URPR Grant #DE-FG52-04NA25590.  ... 
doi:10.1109/cdc.2011.6161127 dblp:conf/cdc/FerrariAFL11 fatcat:ycaulpimpfhvho5qgxipybjpjq

Cooperative Multiagent Deep Deterministic Policy Gradient (CoMADDPG) for Intelligent Connected Transportation with Unsignalized Intersection

Tianhao Wu, Mingzhi Jiang, Lin Zhang
2020 Mathematical Problems in Engineering  
Unsignalized intersection control is one of the most critical issues in intelligent transportation systems, which requires connected and automated vehicles to support more frequent information interaction  ...  with the scenario of unsignalized intersection control.  ...  Cooperative Multiagent Deep Deterministic Policy Gradient In this paper, partially observable Markov games are considered, constituting a multiagent Markov decision process. e possible state S, a set of  ... 
doi:10.1155/2020/1820527 fatcat:opkcxvn5vbhbfciytszlfmz7iy

Simulation-based optimization of Markov decision processes: An empirical process theory approach

Rahul Jain, Pravin Varaiya
2010 Automatica  
The goal of this paper is to extend the reach of this rich and rapidly developing theory to Markov decision processes and Multiarmed bandits problems, and use this framework to solve the optimal policy  ...  We generalize and build on the PAC Learning framework for Markov Decision Processes developed in Jain and Varaiya (2006) . We consider the reward function to depend on both the state and the action.  ...  We propose an empirical process theory approach to simulation-based optimization of Markov decision processes.  ... 
doi:10.1016/j.automatica.2010.05.021 fatcat:liylwmxl3ngijcfnanpdntcjai

Anomaly detection using projective Markov models in a distributed sensor network

Sean Meyn, Amit Surana, Yiqing Lin, Satish Narayanan
2009 Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference  
The paper develops application of techniques from robust and universal hypothesis testing for anomaly detection and change-point detection in dynamic, interconnected systems.  ...  This theory is extended using the concept of projected Markov models originally proposed by Claude Shannon.  ...  Multiple Models and Partial Information 1) Partial and Distributed Information: Suppose that we observe only a few function of the process Z.  ... 
doi:10.1109/cdc.2009.5400612 dblp:conf/cdc/MeynSLN09 fatcat:kkemnuftang47jnisyuirirwkq

Eleventh conference on stochastic processes and their applications

1984 Stochastic Processes and their Applications  
Filrrring ami stochastic control Optimal control of Markov Processes Arie Hordijk, Unicarsity of Leidcn, The Netherlands Firstly we consider Markov decision chains with a denumerable state space and  ...  Problems of this type often arise in connection with Markov decision processes.  ... 
doi:10.1016/0304-4149(84)90173-x fatcat:wanpovevh5bltlkl7ngbxbi2ga

A geometric optimization approach to tracking maneuvering targets using a heterogeneous mobile sensor network

Silvia Ferrari, Rafael Fierro, Domagoj Tolic
2009 Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference  
The targets are modeled by a Markov motion process that is commonly used in target tracking applications.  ...  Since the sensors are installed on mobile robots and have limited range, the geometry of their platforms and fields-of-view play a critical role in motion planning and obstacle avoidance.  ...  ACKNOWLEDGMENTS This work is supported in part by the Office of Naval Research (Code 321), and by NSF grant ECS CAREER #0448906. The work of R.  ... 
doi:10.1109/cdc.2009.5400166 dblp:conf/cdc/FerrariFT09 fatcat:jrmx4atf4zeozb7b7gt6xxnoku

Acquiring state from control dynamics to learn grasping policies for robot hands

R. A. Grupen, J. A. Coelho
2002 Advanced Robotics  
For grasping and manipulation, we propose a closed-loop control process that is parametric in the number and identity of contact resources.  ...  A grasp controller can thus be tuned on-line to optimize performance over a variety of object geometries.  ...  Acknowledgements This work was supported in part by the National Science Foundation under grants CISE /CDA-9703217, IRI-9704530 and IRI-9503687.  ... 
doi:10.1163/15685530260182927 fatcat:bizzxnp43jculbnp3zaoznuu2m

Partially Observable Markov Decision Processes: A Geometric Technique and Analysis

Hao Zhang
2010 Operations Research  
This paper presents a novel framework for studying partially observable Markov decision processes (POMDPs) with finite state, action, observation sets, and discounted rewards.  ...  It reveals the connection between the POMDP problem and two computational geometry problems, i.e., finding the vertices of a convex hull and finding the Minkowski sum of convex polytopes, which can help  ...  Acknowledgments The author thanks the associate editor and two anonymous referees for their constructive suggestions that improved the exposition of this paper, and Mahesh Nagarajan for his helpful comments  ... 
doi:10.1287/opre.1090.0697 fatcat:rmzqivhlhbg55e3euejxmsqlti
« Previous Showing results 1 — 15 out of 2,912 results