Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

2010
ACM Transactions on Algorithms
We present two new algorithms for finding optimal strategies for

doi:10.1145/1721837.1721849
fatcat:3p24vqhmirclfkl7ydjaxekaoy
*discounted*, infinite-horizon,*Deterministic**Markov**Decision**Processes*(DMDP). ... We also present a random-izedÕ(m 1/2 n 2 )-time algorithm for finding*Discounted**All*-*Pairs**Shortest**Paths*(DAPSP), improving several previous algorithms. ...*Deterministic**Markov**decision**processes*A*Deterministic**Markov**Decision**Process*(DMDP) is a weighted directed graph G = (V, E, c), where V is a set of states, or vertices, E ⊆ V × V is a set of actions ...##
###
Discounted Deterministic Markov Decision Processes and Discounted All-Pairs Shortest Paths
2009
Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
We present two new algorithms for finding optimal strategies for

doi:10.1137/1.9781611973068.104
fatcat:q7xv4mgjffbn5lia2aytllyt6e
*discounted*, infinite-horizon,*Deterministic**Markov**Decision**Processes*(DMDP). ... We also present a random-izedÕ(m 1/2 n 2 )-time algorithm for finding*Discounted**All*-*Pairs**Shortest**Paths*(DAPSP), improving several previous algorithms. ...*Deterministic**Markov**decision**processes*A*Deterministic**Markov**Decision**Process*(DMDP) is a weighted directed graph G = (V, E, c), where V is a set of states, or vertices, E ⊆ V × V is a set of actions ...##
###
Page 5031 of Mathematical Reviews Vol. , Issue 95h
1995
Mathematical Reviews
This paper presents a comparison between optimal randomized

*and**deterministic*strategies for semi-*Markov**decision**processes*with linear constraints. ... This paper deals with an infinite horizon*discounted*cost semi-*Markov**decision**process*with linear constraints. ...##
###
Page 8259 of Mathematical Reviews Vol. , Issue 98M
1998
Mathematical Reviews
Summary: “In this paper we give three subcubic cost algorithms for the

*all**pairs**shortest*distance (APSD)*and**path*(APSP) problems. ... (J-IBAR-C; Hitachi) Subcubic cost algorithms for the*all**pairs**shortest**path*problem. (English summary) Algorithmica 20 (1998), no. 3, 309-318. ...##
###
Improved Strongly Polynomial Algorithms for Deterministic MDPs, 2VPI Feasibility, and Discounted All-Pairs Shortest Paths
2021
arXiv
Additionally, we show a new algorithm for the

arXiv:2110.15070v1
fatcat:ysvlyez4cnegzfqlodl7675wom
*Discounted**All*-*Pairs**Shortest**Paths*problem, introduced by Madani et al. [TALG'10], that extends the DMDPs with optional end vertices. ... We revisit the problem of finding optimal strategies for*deterministic**Markov**Decision**Processes*(DMDPs),*and*a closely related problem of testing feasibility of systems of m linear inequalities on n real ... , DNV20] , generalized maximum flows [OV20, Vég17] ,*and*(*discounted*)*Markov**Decision**Processes*[HMZ13, Ye11] . ...##
###
Continuity of the Value of Competitive Markov Decision Processes

2003
Journal of theoretical probability
We provide a bound for the variation of the function that assigns to every competitive

doi:10.1023/b:jotp.0000011995.28536.ef
fatcat:jc7ahtax35gjzml65xoq5imyqy
*Markov**decision**process**and*every*discount*factor its*discounted*value. ... This bound implies that the undiscounted value of a competitive*Markov**decision**process*is continuous in the relative interior of the space of transition rules. ... INTRODUCTION A*Markov**Decision**Process*(MDP) is given by (i) a finite set of states S*and*an initial state s 1 ¥ S, (ii) a finite set of actions A, (iii) a cost function c: S × A Q R,*and*(iv) a transition ...##
###
Reinforcement Learning based Stochastic Shortest Path Finding in Wireless Sensor Networks

2019
IEEE Access
We model the

doi:10.1109/access.2019.2950055
fatcat:4lsmounvafcdhldfzxsexxs5ri
*path*-finding procedure as a*Markov**decision**process**and*propose two online*path*-finding algorithms: Q SSP algorithm*and*SARSA SSP algorithm, both combined with specifically-devised average ... the global stochastic*shortest**path*every time. ... CONCLUSION In this paper, we tackle the stochastic*shortest**path*problem using reinforcement learning schemes by modeling the*path*searching procedure as an appropriate*discounted**Markov**decision**process*...##
###
Page 1811 of Mathematical Reviews Vol. , Issue 89C
1989
Mathematical Reviews
The P-completeness result uses a reduction from the circuit value prob- lem, while the NC-results interpret the

*deterministic*versions as*shortest**path*problems in (possibly infinite) directed graphs*and*... employ known parallel algorithms for the*shortest**path*problem. ...##
###
New prioritized value iteration for Markov decision processes

2011
Artificial Intelligence Review
The problem of solving large

doi:10.1007/s10462-011-9224-z
fatcat:jteuazrrpnep7lvn4eagqbse7m
*Markov**decision**processes*accurately*and*quickly is challenging. ... On the other hand,*shortest**path*methods, such as Dijkstra's algorithm which is based on priority queues, have been applied successfully to the solution of*deterministic**shortest*-*path**Markov**decision**processes*...*Markov**Decision**Processes**Markov**decision**processes*(MDPs) provide a mathematical framework for modeling sequential*decision*problems in uncertain dynamic environments (Bellman 1957) (Puterman 2005 ...##
###
Discounting the Future in Systems Theory
2003
Lecture Notes in Computer Science
*

*Discounting*(or inflation) is a key paradigm in economics

*and*has been studied in

*Markov*

*decision*

*processes*as well as game theory. ...

*Discounting*the future means that the value, today, of a unit payoff is 1 if the payoff occurs today, a if it occurs tomorrow, a 2 if it occurs the day after tomorrow,

*and*so on, for some real-valued

*discount*... The one-player game structures coincide with

*Markov*

*decision*

*processes*(MDPs) [10] . ...

##
###
Dantzig's pivoting rule for shortest paths, deterministic MDPs, and minimum cost to time ratio cycles
2013
Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms
We improve Orlin's bound for

doi:10.1137/1.9781611973402.63
dblp:conf/soda/HansenKZ14
fatcat:7dbnw6vmtfegtfs5ggqsf3ksay
*shortest**paths**and*Post*and*Ye's bound for*deterministic*MDPs with the same*discount*factor by a factor of n to O(mn log n),*and*O(m 2 n 2 log 2 n), respectively. ... We also improve by a factor of n the bound for*deterministic*MDPs with varying*discounts*when*all**discount*factors are close to 1. ... . , v k , u 2i−1 , u 2i k/2 i=1 u 1 u 2 · · · u k v 1 v 2 · · · v k w 1 w 2 · · · w k*Discounted*,*deterministic**Markov**decision**processes*In this section we prove the following theorem which is essentially ...##
###
Approximation Algorithms for Orienteering and Discounted-Reward TSP

2007
SIAM journal on computing (Print)
This problem is motivated by an approximation to a planning problem in the

doi:10.1137/050645464
fatcat:lk726adxljc4josuurofdkhy3y
*Markov**decision**process*(MDP) framework under the commonly employed infinite horizon*discounted*reward optimality criterion. ... In the*Discounted*-Reward TSP, instead of a length limit we are given a*discount*factor γ,*and*the goal is to maximize total*discounted*reward collected, where reward for a node reached at time t is*discounted*...*Markov**decision**process*motivation A*Markov**decision**process*(MDP) consists of a state space S, a set of actions A, a probabilistic transition function T ,*and*a reward function R. ...##
###
Suboptimality Bounds for Stochastic Shortest Path Problems
2012
arXiv
We consider how to use the Bellman residual of the dynamic programming operator to compute suboptimality bounds for solutions to stochastic

arXiv:1202.3729v1
fatcat:vf4q5afr6be7vf3flc2pgyd2jm
*shortest**path*problems. ... Such bounds have been previously established only in the special case that "*all*policies are proper," in which case the dynamic programming operator is known to be a contraction,*and*have been shown to ... Stochastic*shortest**path*problem Like any discrete-time*Markov**decision**process*(MDP), a stochastic*shortest**path*problem includes a set of states, S,*and*a set of control actions, U , which we assume ...##
###
Identifiability in inverse reinforcement learning
2021
arXiv
Inverse reinforcement learning attempts to reconstruct the reward function in a

arXiv:2106.03498v3
fatcat:ydba4vjqr5bsfmy5yuc5j6neli
*Markov**decision*problem, using observations of agent actions. ... For a given environment, we fully characterize the reward functions leading to a given policy*and*demonstrate that, given demonstrations of actions for the same reward under two distinct*discount*factors ... Acknowledgements The authors acknowledge the support of the Alan Turing Institute under the Engineering*and*Physical Sciences Research Council grant EP/N510129/1. ...##
###
Approximate Policy Iteration for Semi-Markov Control Revisited

2011
Procedia Computer Science
The semi-

doi:10.1016/j.procs.2011.08.046
fatcat:dcv6jiyq7zagpf6znvz5qngg64
*Markov**decision**process*can be solved via reinforcement learning without generating its transition model. ... Then, we also consider its average reward counterpart, which requires an updating based on the stochastic*shortest**path*(SSP). ... We presented an analysis of the*discounted*reward algorithm that accounts for continuous reward rates*and*an average reward algorithm that bypasses the SSP update. ...
