Filters








716 Hits in 4.3 sec

The simplex method is strongly polynomial for deterministic Markov decision processes [article]

Ian Post, Yinyu Ye
2013 arXiv   pre-print
We prove that the simplex method with the highest gain/most-negative-reduced cost pivoting rule converges in strongly polynomial time for deterministic Markov decision processes (MDPs) regardless of the  ...  For a deterministic MDP with n states and m actions, we prove the simplex method runs in O(n^3m^2log^2 n) iterations if the discount factor is uniform and O(n^5m^3log^2 n) iterations if each action has  ...  Introduction Markov decision processes (MDPs) are a powerful tool for modeling repeated decision making in stochastic, dynamic environments.  ... 
arXiv:1208.5083v2 fatcat:qrvqltlxxbdg7jl63vw2sbkt6q

The simplex method is strongly polynomial for deterministic Markov decision processes [chapter]

Ian Post, Yinyu Ye
2013 Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms  
We prove that the simplex method with the highest gain/most-negative-reduced cost pivoting rule converges in strongly polynomial time for deterministic Markov decision processes (MDPs) regardless of the  ...  For a deterministic MDP with n states and m actions, we prove the simplex method runs in O(n 3 m 2 log 2 n) iterations if the discount factor is uniform and O(n 5 m 3 log 2 n) iterations if each action  ...  Introduction Markov decision processes (MDPs) are a powerful tool for modeling repeated decision making in stochastic, dynamic environments.  ... 
doi:10.1137/1.9781611973105.105 dblp:conf/soda/PostY13 fatcat:7zn2v5my5jhvrl3ufnekvsuauy

The Simplex Method is Strongly Polynomial for Deterministic Markov Decision Processes

Ian Post, Yinyu Ye
2015 Mathematics of Operations Research  
We prove that the simplex method with the highest gain/most-negative-reduced cost pivoting rule converges in strongly polynomial time for deterministic Markov decision processes (MDPs) regardless of the  ...  For a deterministic MDP with n states and m actions, we prove the simplex method runs in O(n 3 m 2 log 2 n) iterations if the discount factor is uniform and O(n 5 m 3 log 2 n) iterations if each action  ...  The authors would like to thank Kazuhisa Makino for pointing out an error in Lemma 3.3.  ... 
doi:10.1287/moor.2014.0699 fatcat:oerirap3nfbwrorlark5b3hcl4

On the Complexity of Solving Markov Decision Problems [article]

Michael L. Littman, Thomas L. Dean, Leslie Pack Kaelbling
2013 arXiv   pre-print
Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning.  ...  To encourage future research, we sketch some alternative methods of analysis that rely on the structure of MDPs.  ...  Acknowledgments Thanks to Justin Boyan, Tony Cassandra, Anne Con don, Paul Dagum, Michael Jordan, Philip Klein, Hsueh-1 Lu, Walter Ludwig, Satinder Singh, John Tsitsiklis, and Marty Puterman for pointers  ... 
arXiv:1302.4971v1 fatcat:s77uwofrtfh3njhgcaqn7bwb4a

The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Yinyu Ye
2011 Mathematics of Operations Research  
We prove that the classic policy-iteration method (Howard 1960) , including the Simplex method (Dantzig 1947) with the most-negative-reduced-cost pivoting rule, is a strongly polynomial-time algorithm  ...  Furthermore, the computational complexity of the policyiteration method (including the Simplex method) is superior to that of the only known strongly polynomial-time interior-point algorithm ([28] 2005  ...  I thank Pete Veinott and four anonymous Referees for many insightful discussions and suggestions on this subject, which have greatly improved the presentation of the paper.  ... 
doi:10.1287/moor.1110.0516 fatcat:elehu5k54jewvcy3fozo6xzrgu

Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming

Eugene A. Feinberg, Jefferson Huang, Bruno Scherrer
2014 Operations Research Letters  
Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and λ-policy iteration algorithms are not strongly polynomial.  ...  This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem  ...  Acknowledgment The research of the first two authors was partially supported by NSF Grant CMMI-1335296.  ... 
doi:10.1016/j.orl.2014.07.006 fatcat:vt3ycq33vnddlmuuzcvc7cpifq

Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space [article]

Johannes Müller, Guido Montúfar
2022 arXiv   pre-print
Reward optimization in fully observable Markov decision processes is equivalent to a linear program over the polytope of state-action frequencies.  ...  Taking a similar perspective in the case of partially observable Markov decision processes with memoryless stochastic policies, the problem was recently formulated as the optimization of a linear objective  ...  JM also acknowledges support from the International Max Planck Research School for Mathematics in the Sciences and the Evangelisches Studienwerk Villigst e.V..  ... 
arXiv:2205.14098v1 fatcat:ovo7lx6hrjeufioqlilkphz4te

Page 2222 of Mathematical Reviews Vol. , Issue 99c [page]

1991 Mathematical Reviews  
Shao Hui] (PRC-HKSTB; Kowloon) Markov decision programming for process control in batch production.  ...  In this paper a polynomial time primal network simplex algorithm for the minimum cost flow problem is developed.  ... 

Comments on: Recent progress on the combinatorial diameter of polytopes and simplicial complexes

Jesús A. De Loera
2013 TOP - An Official Journal of the Spanish Society of Statistics and Operations Research  
I am also grateful to the Technische Universität München for the hospitality received during the time of writing this article.  ...  I also want to thank the editors of this volume for the invitation to contribute a commentary to this special issue.  ...  programs derived from Markov Decision Processes with Fixed Discount (which is not the setting for the other papers, but is an important case of MDPs).  ... 
doi:10.1007/s11750-013-0291-y fatcat:klst3zgaeva6rmik6ddbdtvf4a

Page 5394 of Mathematical Reviews Vol. , Issue 95i [page]

1995 Mathematical Reviews  
The basic idea behind the numerical approximation methods is to build a discrete Markov decision process with finite state space and finite control space which is readily solvable and approximates the  ...  Summary: “This paper deals with numerical methods for the optimization of piecewise, stationary and deterministic systems.  ... 

Strong Polynomiality of the Value Iteration Algorithm for Computing Nearly Optimal Policies for Discounted Dynamic Programming [article]

Eugene A. Feinberg, Gaojin He
2020 arXiv   pre-print
This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number  ...  For a given discount factor, magnitude of the reward function, and desired closeness to optimality, these upper bounds are strongly polynomial in the number of state-action pairs, and one of the provided  ...  Introduction Value and policy iteration algorithms are the major tools for solving infinite-horizon discounted Markov decision processes (MDPs).  ... 
arXiv:2001.10174v1 fatcat:gss3vsncgvd3tcx2nu4xj2hjqa

The complexity of Policy Iteration is exponential for discounted Markov Decision Processes

Romain Hollanders, Jean-Charles Delvenne, Raphael M. Jungers
2012 2012 IEEE 51st IEEE Conference on Decision and Control (CDC)  
The question of knowing whether the Policy Iteration algorithm (PI) for solving stationary Markov Decision Processes (MDPs) has exponential or (strongly) polynomial complexity has attracted much attention  ...  On the other hand, it was shown that PI runs in strongly polynomial time on discounted-reward MDPs, yet only when the discount factor is fixed beforehand.  ...  Markov Decision Processes can be solved in weakly polynomial time using Linear Programming (LP) [16] .  ... 
doi:10.1109/cdc.2012.6426485 dblp:conf/cdc/HollandersDJ12 fatcat:74465ayzkvhjzp6pxrjxrjuwwq

A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes

Archis Ghate, Robert L. Smith
2013 Operations Research  
A new result by Ye [51] shows that Dantzig's original Simplex method with the most negative reduced cost pivoting rule [15] is strongly polynomial for solving stationary MDPs.  ...  This complexity bound is better than the polynomial performance of value iteration [49, 51] , and in fact, is superior to the only known strongly polynomial time interior point algorithm [50] for solving  ...  Similar to Dantzig's strongly polynomial time Simplex method with the most negative reduced cost pivot rule for stationary MDPs, our infinite-dimensional Simplex method uses a most negative approximate  ... 
doi:10.1287/opre.1120.1121 fatcat:3blutyfygfetfhud2i63zju2ga

The existence of a strongly polynomial time simplex algorithm for linear programming problems [article]

Zi-zong Yan, Xiang-jun Li, Jinhai Guo
2022 arXiv   pre-print
It is well known how to clarify whether there is a polynomial time simplex algorithm for linear programming (LP) is the most challenging open problem in optimization and discrete geometry.  ...  We show that there is a simplex algorithm whose number of pivoting steps does not exceed the number of variables of a LP problem.  ...  There has been recent interest in finding an algorithm like this for the deterministic Markov decision processes [56] , the generalized circulation problem [28] , the maximum flow problem [2, 29] ,  ... 
arXiv:2006.11466v13 fatcat:xb5gazihwvex5b7f5kkblcsqbm

Dantzig's pivoting rule for shortest paths, deterministic MDPs, and minimum cost to time ratio cycles [chapter]

Thomas Dueholm Hansen, Haim Kaplan, Uri Zwick
2013 Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms  
Dantzig's pivoting rule is one of the most studied pivoting rules for the simplex algorithm.  ...  This gives a strongly polynomial time algorithm for the problem that does not use Megiddo's parametric search technique.  ...  · w k Discounted, deterministic Markov decision processes In this section we prove the following theorem which is essentially a generalization of Theorem 3.1.  ... 
doi:10.1137/1.9781611973402.63 dblp:conf/soda/HansenKZ14 fatcat:7dbnw6vmtfegtfs5ggqsf3ksay
« Previous Showing results 1 — 15 out of 716 results