A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2010; you can also visit the original URL.
The file type is `application/pdf`

.

## Filters

##
###
Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

2010
*
ACM Transactions on Algorithms
*

We present two new algorithms for finding optimal strategies for

doi:10.1145/1721837.1721849
fatcat:3p24vqhmirclfkl7ydjaxekaoy
*discounted*, infinite-horizon,*Deterministic**Markov**Decision**Processes*(DMDP). ... We also present a random-izedÕ(m 1/2 n 2 )-time algorithm for finding*Discounted**All*-*Pairs**Shortest**Paths*(DAPSP), improving several previous algorithms. ...*Deterministic**Markov**decision**processes*A*Deterministic**Markov**Decision**Process*(DMDP) is a weighted directed graph G = (V, E, c), where V is a set of states, or vertices, E ⊆ V × V is a set of actions ...##
###
Discounted Deterministic Markov Decision Processes and Discounted All-Pairs Shortest Paths
[chapter]

2009
*
Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
*

We present two new algorithms for finding optimal strategies for

doi:10.1137/1.9781611973068.104
fatcat:q7xv4mgjffbn5lia2aytllyt6e
*discounted*, infinite-horizon,*Deterministic**Markov**Decision**Processes*(DMDP). ... We also present a random-izedÕ(m 1/2 n 2 )-time algorithm for finding*Discounted**All*-*Pairs**Shortest**Paths*(DAPSP), improving several previous algorithms. ...*Deterministic**Markov**decision**processes*A*Deterministic**Markov**Decision**Process*(DMDP) is a weighted directed graph G = (V, E, c), where V is a set of states, or vertices, E ⊆ V × V is a set of actions ...##
###
Page 5031 of Mathematical Reviews Vol. , Issue 95h
[page]

1995
*
Mathematical Reviews
*

This paper presents a comparison between optimal randomized

*and**deterministic*strategies for semi-*Markov**decision**processes*with linear constraints. ... This paper deals with an infinite horizon*discounted*cost semi-*Markov**decision**process*with linear constraints. ...##
###
Page 8259 of Mathematical Reviews Vol. , Issue 98M
[page]

1998
*
Mathematical Reviews
*

Summary: “In this paper we give three subcubic cost algorithms for the

*all**pairs**shortest*distance (APSD)*and**path*(APSP) problems. ... (J-IBAR-C; Hitachi) Subcubic cost algorithms for the*all**pairs**shortest**path*problem. (English summary) Algorithmica 20 (1998), no. 3, 309-318. ...##
###
Improved Strongly Polynomial Algorithms for Deterministic MDPs, 2VPI Feasibility, and Discounted All-Pairs Shortest Paths
[article]

2021
*
arXiv
*
pre-print

Additionally, we show a new algorithm for the

arXiv:2110.15070v1
fatcat:ysvlyez4cnegzfqlodl7675wom
*Discounted**All*-*Pairs**Shortest**Paths*problem, introduced by Madani et al. [TALG'10], that extends the DMDPs with optional end vertices. ... We revisit the problem of finding optimal strategies for*deterministic**Markov**Decision**Processes*(DMDPs),*and*a closely related problem of testing feasibility of systems of m linear inequalities on n real ... , DNV20] , generalized maximum flows [OV20, Vég17] ,*and*(*discounted*)*Markov**Decision**Processes*[HMZ13, Ye11] . ...##
###
Continuity of the Value of Competitive Markov Decision Processes

2003
*
Journal of theoretical probability
*

We provide a bound for the variation of the function that assigns to every competitive

doi:10.1023/b:jotp.0000011995.28536.ef
fatcat:jc7ahtax35gjzml65xoq5imyqy
*Markov**decision**process**and*every*discount*factor its*discounted*value. ... This bound implies that the undiscounted value of a competitive*Markov**decision**process*is continuous in the relative interior of the space of transition rules. ... INTRODUCTION A*Markov**Decision**Process*(MDP) is given by (i) a finite set of states S*and*an initial state s 1 ¥ S, (ii) a finite set of actions A, (iii) a cost function c: S × A Q R,*and*(iv) a transition ...##
###
Reinforcement Learning based Stochastic Shortest Path Finding in Wireless Sensor Networks

2019
*
IEEE Access
*

We model the

doi:10.1109/access.2019.2950055
fatcat:4lsmounvafcdhldfzxsexxs5ri
*path*-finding procedure as a*Markov**decision**process**and*propose two online*path*-finding algorithms: Q SSP algorithm*and*SARSA SSP algorithm, both combined with specifically-devised average ... the global stochastic*shortest**path*every time. ... CONCLUSION In this paper, we tackle the stochastic*shortest**path*problem using reinforcement learning schemes by modeling the*path*searching procedure as an appropriate*discounted**Markov**decision**process*...##
###
Page 1811 of Mathematical Reviews Vol. , Issue 89C
[page]

1989
*
Mathematical Reviews
*

The P-completeness result uses a reduction from the circuit value prob- lem, while the NC-results interpret the

*deterministic*versions as*shortest**path*problems in (possibly infinite) directed graphs*and*... employ known parallel algorithms for the*shortest**path*problem. ...##
###
New prioritized value iteration for Markov decision processes

2011
*
Artificial Intelligence Review
*

The problem of solving large

doi:10.1007/s10462-011-9224-z
fatcat:jteuazrrpnep7lvn4eagqbse7m
*Markov**decision**processes*accurately*and*quickly is challenging. ... On the other hand,*shortest**path*methods, such as Dijkstra's algorithm which is based on priority queues, have been applied successfully to the solution of*deterministic**shortest*-*path**Markov**decision**processes*...*Markov**Decision**Processes**Markov**decision**processes*(MDPs) provide a mathematical framework for modeling sequential*decision*problems in uncertain dynamic environments (Bellman 1957) (Puterman 2005 ...##
###
Discounting the Future in Systems Theory
[chapter]

2003
*
Lecture Notes in Computer Science
*

*Discounting*(or inflation) is a key paradigm in economics

*and*has been studied in

*Markov*

*decision*

*processes*as well as game theory. ...

*Discounting*the future means that the value, today, of a unit payoff is 1 if the payoff occurs today, a if it occurs tomorrow, a 2 if it occurs the day after tomorrow,

*and*so on, for some real-valued

*discount*... The one-player game structures coincide with

*Markov*

*decision*

*processes*(MDPs) [10] . ...

##
###
Dantzig's pivoting rule for shortest paths, deterministic MDPs, and minimum cost to time ratio cycles
[chapter]

2013
*
Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms
*

We improve Orlin's bound for

doi:10.1137/1.9781611973402.63
dblp:conf/soda/HansenKZ14
fatcat:7dbnw6vmtfegtfs5ggqsf3ksay
*shortest**paths**and*Post*and*Ye's bound for*deterministic*MDPs with the same*discount*factor by a factor of n to O(mn log n),*and*O(m 2 n 2 log 2 n), respectively. ... We also improve by a factor of n the bound for*deterministic*MDPs with varying*discounts*when*all**discount*factors are close to 1. ... . , v k , u 2i−1 , u 2i k/2 i=1 u 1 u 2 · · · u k v 1 v 2 · · · v k w 1 w 2 · · · w k*Discounted*,*deterministic**Markov**decision**processes*In this section we prove the following theorem which is essentially ...##
###
Approximation Algorithms for Orienteering and Discounted-Reward TSP

2007
*
SIAM journal on computing (Print)
*

This problem is motivated by an approximation to a planning problem in the

doi:10.1137/050645464
fatcat:lk726adxljc4josuurofdkhy3y
*Markov**decision**process*(MDP) framework under the commonly employed infinite horizon*discounted*reward optimality criterion. ... In the*Discounted*-Reward TSP, instead of a length limit we are given a*discount*factor γ,*and*the goal is to maximize total*discounted*reward collected, where reward for a node reached at time t is*discounted*...*Markov**decision**process*motivation A*Markov**decision**process*(MDP) consists of a state space S, a set of actions A, a probabilistic transition function T ,*and*a reward function R. ...##
###
Suboptimality Bounds for Stochastic Shortest Path Problems
[article]

2012
*
arXiv
*
pre-print

We consider how to use the Bellman residual of the dynamic programming operator to compute suboptimality bounds for solutions to stochastic

arXiv:1202.3729v1
fatcat:vf4q5afr6be7vf3flc2pgyd2jm
*shortest**path*problems. ... Such bounds have been previously established only in the special case that "*all*policies are proper," in which case the dynamic programming operator is known to be a contraction,*and*have been shown to ... Stochastic*shortest**path*problem Like any discrete-time*Markov**decision**process*(MDP), a stochastic*shortest**path*problem includes a set of states, S,*and*a set of control actions, U , which we assume ...##
###
Identifiability in inverse reinforcement learning
[article]

2021
*
arXiv
*
pre-print

Inverse reinforcement learning attempts to reconstruct the reward function in a

arXiv:2106.03498v3
fatcat:ydba4vjqr5bsfmy5yuc5j6neli
*Markov**decision*problem, using observations of agent actions. ... For a given environment, we fully characterize the reward functions leading to a given policy*and*demonstrate that, given demonstrations of actions for the same reward under two distinct*discount*factors ... Acknowledgements The authors acknowledge the support of the Alan Turing Institute under the Engineering*and*Physical Sciences Research Council grant EP/N510129/1. ...##
###
Approximate Policy Iteration for Semi-Markov Control Revisited

2011
*
Procedia Computer Science
*

The semi-

doi:10.1016/j.procs.2011.08.046
fatcat:dcv6jiyq7zagpf6znvz5qngg64
*Markov**decision**process*can be solved via reinforcement learning without generating its transition model. ... Then, we also consider its average reward counterpart, which requires an updating based on the stochastic*shortest**path*(SSP). ... We presented an analysis of the*discounted*reward algorithm that accounts for continuous reward rates*and*an average reward algorithm that bypasses the SSP update. ...
« Previous

*Showing results 1 — 15 out of 1,377 results*