Filters

716 Hits in 4.3 sec

### The simplex method is strongly polynomial for deterministic Markov decision processes [article]

Ian Post, Yinyu Ye
2013 arXiv   pre-print
We prove that the simplex method with the highest gain/most-negative-reduced cost pivoting rule converges in strongly polynomial time for deterministic Markov decision processes (MDPs) regardless of the  ...  For a deterministic MDP with n states and m actions, we prove the simplex method runs in O(n^3m^2log^2 n) iterations if the discount factor is uniform and O(n^5m^3log^2 n) iterations if each action has  ...  Introduction Markov decision processes (MDPs) are a powerful tool for modeling repeated decision making in stochastic, dynamic environments.  ...

### The simplex method is strongly polynomial for deterministic Markov decision processes [chapter]

Ian Post, Yinyu Ye
2013 Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms
We prove that the simplex method with the highest gain/most-negative-reduced cost pivoting rule converges in strongly polynomial time for deterministic Markov decision processes (MDPs) regardless of the  ...  For a deterministic MDP with n states and m actions, we prove the simplex method runs in O(n 3 m 2 log 2 n) iterations if the discount factor is uniform and O(n 5 m 3 log 2 n) iterations if each action  ...  Introduction Markov decision processes (MDPs) are a powerful tool for modeling repeated decision making in stochastic, dynamic environments.  ...

### The Simplex Method is Strongly Polynomial for Deterministic Markov Decision Processes

Ian Post, Yinyu Ye
2015 Mathematics of Operations Research
We prove that the simplex method with the highest gain/most-negative-reduced cost pivoting rule converges in strongly polynomial time for deterministic Markov decision processes (MDPs) regardless of the  ...  For a deterministic MDP with n states and m actions, we prove the simplex method runs in O(n 3 m 2 log 2 n) iterations if the discount factor is uniform and O(n 5 m 3 log 2 n) iterations if each action  ...  The authors would like to thank Kazuhisa Makino for pointing out an error in Lemma 3.3.  ...

### On the Complexity of Solving Markov Decision Problems [article]

Michael L. Littman, Thomas L. Dean, Leslie Pack Kaelbling
2013 arXiv   pre-print
Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning.  ...  To encourage future research, we sketch some alternative methods of analysis that rely on the structure of MDPs.  ...  Acknowledgments Thanks to Justin Boyan, Tony Cassandra, Anne Con don, Paul Dagum, Michael Jordan, Philip Klein, Hsueh-1 Lu, Walter Ludwig, Satinder Singh, John Tsitsiklis, and Marty Puterman for pointers  ...

### The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Yinyu Ye
2011 Mathematics of Operations Research
We prove that the classic policy-iteration method (Howard 1960) , including the Simplex method (Dantzig 1947) with the most-negative-reduced-cost pivoting rule, is a strongly polynomial-time algorithm  ...  Furthermore, the computational complexity of the policyiteration method (including the Simplex method) is superior to that of the only known strongly polynomial-time interior-point algorithm ( 2005  ...  I thank Pete Veinott and four anonymous Referees for many insightful discussions and suggestions on this subject, which have greatly improved the presentation of the paper.  ...

### Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming

Eugene A. Feinberg, Jefferson Huang, Bruno Scherrer
2014 Operations Research Letters
Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and λ-policy iteration algorithms are not strongly polynomial.  ...  This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem  ...  Acknowledgment The research of the first two authors was partially supported by NSF Grant CMMI-1335296.  ...

### Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space [article]

Johannes Müller, Guido Montúfar
2022 arXiv   pre-print
Reward optimization in fully observable Markov decision processes is equivalent to a linear program over the polytope of state-action frequencies.  ...  Taking a similar perspective in the case of partially observable Markov decision processes with memoryless stochastic policies, the problem was recently formulated as the optimization of a linear objective  ...  JM also acknowledges support from the International Max Planck Research School for Mathematics in the Sciences and the Evangelisches Studienwerk Villigst e.V..  ...

### Page 2222 of Mathematical Reviews Vol. , Issue 99c [page]

1991 Mathematical Reviews
Shao Hui] (PRC-HKSTB; Kowloon) Markov decision programming for process control in batch production.  ...  In this paper a polynomial time primal network simplex algorithm for the minimum cost flow problem is developed.  ...

### Comments on: Recent progress on the combinatorial diameter of polytopes and simplicial complexes

Jesús A. De Loera
2013 TOP - An Official Journal of the Spanish Society of Statistics and Operations Research
I am also grateful to the Technische Universität München for the hospitality received during the time of writing this article.  ...  I also want to thank the editors of this volume for the invitation to contribute a commentary to this special issue.  ...  programs derived from Markov Decision Processes with Fixed Discount (which is not the setting for the other papers, but is an important case of MDPs).  ...

### Page 5394 of Mathematical Reviews Vol. , Issue 95i [page]

1995 Mathematical Reviews
The basic idea behind the numerical approximation methods is to build a discrete Markov decision process with finite state space and finite control space which is readily solvable and approximates the  ...  Summary: “This paper deals with numerical methods for the optimization of piecewise, stationary and deterministic systems.  ...

### Strong Polynomiality of the Value Iteration Algorithm for Computing Nearly Optimal Policies for Discounted Dynamic Programming [article]

Eugene A. Feinberg, Gaojin He
2020 arXiv   pre-print
This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number  ...  For a given discount factor, magnitude of the reward function, and desired closeness to optimality, these upper bounds are strongly polynomial in the number of state-action pairs, and one of the provided  ...  Introduction Value and policy iteration algorithms are the major tools for solving infinite-horizon discounted Markov decision processes (MDPs).  ...

### The complexity of Policy Iteration is exponential for discounted Markov Decision Processes

Romain Hollanders, Jean-Charles Delvenne, Raphael M. Jungers
2012 2012 IEEE 51st IEEE Conference on Decision and Control (CDC)
The question of knowing whether the Policy Iteration algorithm (PI) for solving stationary Markov Decision Processes (MDPs) has exponential or (strongly) polynomial complexity has attracted much attention  ...  On the other hand, it was shown that PI runs in strongly polynomial time on discounted-reward MDPs, yet only when the discount factor is fixed beforehand.  ...  Markov Decision Processes can be solved in weakly polynomial time using Linear Programming (LP)  .  ...

### A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes

Archis Ghate, Robert L. Smith
2013 Operations Research
A new result by Ye  shows that Dantzig's original Simplex method with the most negative reduced cost pivoting rule  is strongly polynomial for solving stationary MDPs.  ...  This complexity bound is better than the polynomial performance of value iteration [49, 51] , and in fact, is superior to the only known strongly polynomial time interior point algorithm  for solving  ...  Similar to Dantzig's strongly polynomial time Simplex method with the most negative reduced cost pivot rule for stationary MDPs, our infinite-dimensional Simplex method uses a most negative approximate  ...

### The existence of a strongly polynomial time simplex algorithm for linear programming problems [article]

Zi-zong Yan, Xiang-jun Li, Jinhai Guo
2022 arXiv   pre-print
It is well known how to clarify whether there is a polynomial time simplex algorithm for linear programming (LP) is the most challenging open problem in optimization and discrete geometry.  ...  We show that there is a simplex algorithm whose number of pivoting steps does not exceed the number of variables of a LP problem.  ...  There has been recent interest in finding an algorithm like this for the deterministic Markov decision processes  , the generalized circulation problem  , the maximum flow problem [2, 29] ,  ...

### Dantzig's pivoting rule for shortest paths, deterministic MDPs, and minimum cost to time ratio cycles [chapter]

Thomas Dueholm Hansen, Haim Kaplan, Uri Zwick
2013 Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms
Dantzig's pivoting rule is one of the most studied pivoting rules for the simplex algorithm.  ...  This gives a strongly polynomial time algorithm for the problem that does not use Megiddo's parametric search technique.  ...  · w k Discounted, deterministic Markov decision processes In this section we prove the following theorem which is essentially a generalization of Theorem 3.1.  ...
« Previous Showing results 1 — 15 out of 716 results