Filters








1,377 Hits in 7.5 sec

Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

Omid Madani, Mikkel Thorup, Uri Zwick
2010 ACM Transactions on Algorithms  
We present two new algorithms for finding optimal strategies for discounted, infinite-horizon, Deterministic Markov Decision Processes (DMDP).  ...  We also present a random-izedÕ(m 1/2 n 2 )-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving several previous algorithms.  ...  Deterministic Markov decision processes A Deterministic Markov Decision Process (DMDP) is a weighted directed graph G = (V, E, c), where V is a set of states, or vertices, E ⊆ V × V is a set of actions  ... 
doi:10.1145/1721837.1721849 fatcat:3p24vqhmirclfkl7ydjaxekaoy

Discounted Deterministic Markov Decision Processes and Discounted All-Pairs Shortest Paths [chapter]

Omid Madani, Mikkel Thorup, Uri Zwick
2009 Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms  
We present two new algorithms for finding optimal strategies for discounted, infinite-horizon, Deterministic Markov Decision Processes (DMDP).  ...  We also present a random-izedÕ(m 1/2 n 2 )-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving several previous algorithms.  ...  Deterministic Markov decision processes A Deterministic Markov Decision Process (DMDP) is a weighted directed graph G = (V, E, c), where V is a set of states, or vertices, E ⊆ V × V is a set of actions  ... 
doi:10.1137/1.9781611973068.104 fatcat:q7xv4mgjffbn5lia2aytllyt6e

Page 5031 of Mathematical Reviews Vol. , Issue 95h [page]

1995 Mathematical Reviews  
This paper presents a comparison between optimal randomized and deterministic strategies for semi-Markov decision processes with linear constraints.  ...  This paper deals with an infinite horizon discounted cost semi- Markov decision process with linear constraints.  ... 

Page 8259 of Mathematical Reviews Vol. , Issue 98M [page]

1998 Mathematical Reviews  
Summary: “In this paper we give three subcubic cost algorithms for the all pairs shortest distance (APSD) and path (APSP) problems.  ...  (J-IBAR-C; Hitachi) Subcubic cost algorithms for the all pairs shortest path problem. (English summary) Algorithmica 20 (1998), no. 3, 309-318.  ... 

Improved Strongly Polynomial Algorithms for Deterministic MDPs, 2VPI Feasibility, and Discounted All-Pairs Shortest Paths [article]

Adam Karczmarz
2021 arXiv   pre-print
Additionally, we show a new algorithm for the Discounted All-Pairs Shortest Paths problem, introduced by Madani et al. [TALG'10], that extends the DMDPs with optional end vertices.  ...  We revisit the problem of finding optimal strategies for deterministic Markov Decision Processes (DMDPs), and a closely related problem of testing feasibility of systems of m linear inequalities on n real  ...  , DNV20] , generalized maximum flows [OV20, Vég17] , and (discounted) Markov Decision Processes [HMZ13, Ye11] .  ... 
arXiv:2110.15070v1 fatcat:ysvlyez4cnegzfqlodl7675wom

Continuity of the Value of Competitive Markov Decision Processes

Eilon Solan
2003 Journal of theoretical probability  
We provide a bound for the variation of the function that assigns to every competitive Markov decision process and every discount factor its discounted value.  ...  This bound implies that the undiscounted value of a competitive Markov decision process is continuous in the relative interior of the space of transition rules.  ...  INTRODUCTION A Markov Decision Process (MDP) is given by (i) a finite set of states S and an initial state s 1 ¥ S, (ii) a finite set of actions A, (iii) a cost function c: S × A Q R, and (iv) a transition  ... 
doi:10.1023/b:jotp.0000011995.28536.ef fatcat:jc7ahtax35gjzml65xoq5imyqy

Reinforcement Learning based Stochastic Shortest Path Finding in Wireless Sensor Networks

Wenwen Xia, Chong Di, Haonan Guo, Shenghong Li
2019 IEEE Access  
We model the path-finding procedure as a Markov decision process and propose two online path-finding algorithms: Q SSP algorithm and SARSA SSP algorithm, both combined with specifically-devised average  ...  the global stochastic shortest path every time.  ...  CONCLUSION In this paper, we tackle the stochastic shortest path problem using reinforcement learning schemes by modeling the path searching procedure as an appropriate discounted Markov decision process  ... 
doi:10.1109/access.2019.2950055 fatcat:4lsmounvafcdhldfzxsexxs5ri

Page 1811 of Mathematical Reviews Vol. , Issue 89C [page]

1989 Mathematical Reviews  
The P-completeness result uses a reduction from the circuit value prob- lem, while the NC-results interpret the deterministic versions as shortest path problems in (possibly infinite) directed graphs and  ...  employ known parallel algorithms for the shortest path problem.  ... 

New prioritized value iteration for Markov decision processes

Ma. de Guadalupe Garcia-Hernandez, Jose Ruiz-Pinales, Eva Onaindia, J. Gabriel Aviña-Cervantes, Sergio Ledesma-Orozco, Edgar Alvarado-Mendez, Alberto Reyes-Ballesteros
2011 Artificial Intelligence Review  
The problem of solving large Markov decision processes accurately and quickly is challenging.  ...  On the other hand, shortest path methods, such as Dijkstra's algorithm which is based on priority queues, have been applied successfully to the solution of deterministic shortest-path Markov decision processes  ...  Markov Decision Processes Markov decision processes (MDPs) provide a mathematical framework for modeling sequential decision problems in uncertain dynamic environments (Bellman 1957) (Puterman 2005  ... 
doi:10.1007/s10462-011-9224-z fatcat:jteuazrrpnep7lvn4eagqbse7m

Discounting the Future in Systems Theory [chapter]

Luca de Alfaro, Thomas A. Henzinger, Rupak Majumdar
2003 Lecture Notes in Computer Science  
Discounting (or inflation) is a key paradigm in economics and has been studied in Markov decision processes as well as game theory.  ...  Discounting the future means that the value, today, of a unit payoff is 1 if the payoff occurs today, a if it occurs tomorrow, a 2 if it occurs the day after tomorrow, and so on, for some real-valued discount  ...  The one-player game structures coincide with Markov decision processes (MDPs) [10] .  ... 
doi:10.1007/3-540-45061-0_79 fatcat:wlrgea2abjgexnzcid3of7en6m

Dantzig's pivoting rule for shortest paths, deterministic MDPs, and minimum cost to time ratio cycles [chapter]

Thomas Dueholm Hansen, Haim Kaplan, Uri Zwick
2013 Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms  
We improve Orlin's bound for shortest paths and Post and Ye's bound for deterministic MDPs with the same discount factor by a factor of n to O(mn log n), and O(m 2 n 2 log 2 n), respectively.  ...  We also improve by a factor of n the bound for deterministic MDPs with varying discounts when all discount factors are close to 1.  ...  . , v k , u 2i−1 , u 2i k/2 i=1 u 1 u 2 · · · u k v 1 v 2 · · · v k w 1 w 2 · · · w k Discounted, deterministic Markov decision processes In this section we prove the following theorem which is essentially  ... 
doi:10.1137/1.9781611973402.63 dblp:conf/soda/HansenKZ14 fatcat:7dbnw6vmtfegtfs5ggqsf3ksay

Approximation Algorithms for Orienteering and Discounted-Reward TSP

Avrim Blum, Shuchi Chawla, David R. Karger, Terran Lane, Adam Meyerson, Maria Minkoff
2007 SIAM journal on computing (Print)  
This problem is motivated by an approximation to a planning problem in the Markov decision process (MDP) framework under the commonly employed infinite horizon discounted reward optimality criterion.  ...  In the Discounted-Reward TSP, instead of a length limit we are given a discount factor γ, and the goal is to maximize total discounted reward collected, where reward for a node reached at time t is discounted  ...  Markov decision process motivation A Markov decision process (MDP) consists of a state space S, a set of actions A, a probabilistic transition function T , and a reward function R.  ... 
doi:10.1137/050645464 fatcat:lk726adxljc4josuurofdkhy3y

Suboptimality Bounds for Stochastic Shortest Path Problems [article]

Eric A. Hansen
2012 arXiv   pre-print
We consider how to use the Bellman residual of the dynamic programming operator to compute suboptimality bounds for solutions to stochastic shortest path problems.  ...  Such bounds have been previously established only in the special case that "all policies are proper," in which case the dynamic programming operator is known to be a contraction, and have been shown to  ...  Stochastic shortest path problem Like any discrete-time Markov decision process (MDP), a stochastic shortest path problem includes a set of states, S, and a set of control actions, U , which we assume  ... 
arXiv:1202.3729v1 fatcat:vf4q5afr6be7vf3flc2pgyd2jm

Identifiability in inverse reinforcement learning [article]

Haoyang Cao, Samuel N. Cohen, Lukasz Szpruch
2021 arXiv   pre-print
Inverse reinforcement learning attempts to reconstruct the reward function in a Markov decision problem, using observations of agent actions.  ...  For a given environment, we fully characterize the reward functions leading to a given policy and demonstrate that, given demonstrations of actions for the same reward under two distinct discount factors  ...  Acknowledgements The authors acknowledge the support of the Alan Turing Institute under the Engineering and Physical Sciences Research Council grant EP/N510129/1.  ... 
arXiv:2106.03498v3 fatcat:ydba4vjqr5bsfmy5yuc5j6neli

Approximate Policy Iteration for Semi-Markov Control Revisited

Abhijit Gosavi
2011 Procedia Computer Science  
The semi-Markov decision process can be solved via reinforcement learning without generating its transition model.  ...  Then, we also consider its average reward counterpart, which requires an updating based on the stochastic shortest path (SSP).  ...  We presented an analysis of the discounted reward algorithm that accounts for continuous reward rates and an average reward algorithm that bypasses the SSP update.  ... 
doi:10.1016/j.procs.2011.08.046 fatcat:dcv6jiyq7zagpf6znvz5qngg64
« Previous Showing results 1 — 15 out of 1,377 results