A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Improved Strong Worst-case Upper Bounds for MDP Planning
2017
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Specifically, we furnish improved STRONG WORST-CASE upper bounds on the running time of MDP planning. ...
Our contributions are to this general case. For k >= 3, the tightest strong upper bound shown to date for MDP planning belongs to a family of algorithms called Policy Iteration. ...
Table 1 : 1 Summary of tightest known strong upper bounds for MDP planning. ...
doi:10.24963/ijcai.2017/248
dblp:conf/ijcai/GuptaK17
fatcat:rqtv4lgsybhtpgkltz64ccatmq
Algorithms and Conditional Lower Bounds for Planning Problems
2021
Artificial Intelligence
We consider planning problems for graphs, Markov decision processes (MDPs), and games on graphs. ...
For the coverage problem, we present a linear-time algorithm for graphs, and quadratic conditional lower bound for MDPs and games on graphs. ...
Acknowledgments The authors are grateful to the anonymous referees for their valuable comments and suggestions to improve the presentation of the paper. A. ...
doi:10.1016/j.artint.2021.103499
fatcat:3wzfkogrhngmhkpsvjmivupdne
On the Complexity of Policy Iteration
[article]
2013
arXiv
pre-print
In this paper we prove the first such non-trivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy. ...
Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). ...
For greedy policy iteration we proved an upper-bound of a e n ·), and for random policy iteration we proved an upper-bound of 0(2°·78n), both in the case that the MDP has two actions. ...
arXiv:1301.6718v1
fatcat:j7cw5flx5reazesmh5by37i5ie
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds
[article]
2019
arXiv
pre-print
Strong worst-case performance bounds for episodic reinforcement learning exist but fortunately in practice RL algorithms perform much better than such bounds would predict. ...
As a step towards this we derive an algorithm for finite horizon discrete MDPs and associated analysis that both yields state-of-the art worst-case regret bounds in the dominant terms and yields substantially ...
Acknowledgments The authors are grateful for the high quality feedback of the reviewers, and for the comments of Yonathan Efroni and his colleagues, including Mohammad Ghavamzadeh and Shie Mannor, who ...
arXiv:1901.00210v4
fatcat:bzjjrwfgdfdk5kdk6lior7emuu
Algorithms and Conditional Lower Bounds for Planning Problems
[article]
2018
arXiv
pre-print
We consider planning problems for graphs, Markov decision processes (MDPs), and games on graphs. ...
For the coverage problem, we present a linear-time algorithm for graphs and quadratic conditional lower bound for MDPs and games on graphs. ...
Acknowledgments The authors are grateful to the anonymous referees for their valuable comments and suggestions to improve the presentation of the paper. A. ...
arXiv:1804.07031v1
fatcat:5suwpcwwdzgujj75xgfedcizle
On the Complexity of Solving Markov Decision Problems
[article]
2013
arXiv
pre-print
Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. ...
We argue that, although MDPs can be solved efficiently in theory, more study is needed to reveal practical algorithms for solving large problems quickly. ...
Acknowledgments Thanks to Justin Boyan, Tony Cassandra, Anne Con don, Paul Dagum, Michael Jordan, Philip Klein, Hsueh-1 Lu, Walter Ludwig, Satinder Singh, John Tsitsiklis, and Marty Puterman for pointers ...
arXiv:1302.4971v1
fatcat:s77uwofrtfh3njhgcaqn7bwb4a
DESPOT: Online POMDP Planning with Regularization
2017
The Journal of Artificial Intelligence Research
Leveraging this result, we give an anytime online planning algorithm, which searches a DESPOT for a policy that optimizes a regularized objective function. ...
The algorithm demonstrates strong experimental results, compared with some of the best online POMDP algorithms available. ...
We are grateful to the anonymous reviewers for carefully reading the manuscript and providing many suggestions which helped greatly in improving the paper. ...
doi:10.1613/jair.5328
fatcat:rk7kxw64lzccfd5rvbo3enkjnu
Simple Regret Optimization in Online Planning for Markov Decision Processes
2014
The Journal of Artificial Intelligence Research
We consider online planning in Markov decision processes (MDPs). ...
To date, state-of-the-art algorithms for online planning in general MDPs are either best effort, or guarantee only polynomial-rate reduction of simple regret over time. ...
However, when it comes to formal guarantees on the expected performance improvement over the planning time, none of the online MCTS algorithms for MDPs breaks the barrier of the worst-case polynomial-rate ...
doi:10.1613/jair.4432
fatcat:xxccofbxv5ea3bpb2aajzlrev4
Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters
2017
Proceedings of the 11th EAI International Conference on Performance Evaluation Methodologies and Tools - VALUETOOLS 2017
In particular, the approach is defined for bounded-parameter Markov decision processes (BMDPs) [GLD00] . In this setting the worst, best and average case performance of a policy is analyzed. ...
Markov decision processes (MDPs) are a well established model for planing under uncertainty. ...
BMDPs define upper and lower bounds for the transition probabilities and rewards and allow one to analyze the worst and best case behavior. ...
doi:10.1145/3150928.3150945
dblp:conf/valuetools/Scheftelowitsch17
fatcat:ghcrcgrtdrbhtdixi2kumzpqei
Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations
[article]
2021
arXiv
pre-print
We significantly improve the robustness of PPO, DDPG and DQN agents under a suite of strong white box adversarial attacks, including new attacks of our own. ...
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks. ...
, 15] , or does not significantly improve robustness under strong attacks. ...
arXiv:2003.08938v7
fatcat:64dqpsscovbfzm42rucdvbkvdy
On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function
[article]
2021
arXiv
pre-print
Whether the computation cost is similarly bounded remains an open question. We extend the upper bound to the near-realizable case and to the infinite-horizon discounted setup. ...
The generative model provides a local access to the MDP: The planner can ask for random transitions from previously returned states and arbitrary actions, and features are only accessible for states that ...
Acknowledgements We thank the anonymous reviewers for their helpful comments. This work was done while the authors were visiting the Simons Institute for the Theory of Computing. ...
arXiv:2102.02049v3
fatcat:nqmdydkd5rddje3s2ujijabjxa
Nearly Horizon-Free Offline Reinforcement Learning
[article]
2022
arXiv
pre-print
d_m, we obtain nearly horizon H-free sample complexity bounds for offline reinforcement learning when the total reward is upper bounded by 1. ...
To the best of our knowledge, these are the first set of nearly horizon-free bounds for episodic time-homogeneous offline tabular MDP and linear MDP with anchor points. ...
Acknowledgement The authors thank the anonymous reviewer for their constructive feedback. TR would like to thank for the helpful discussion with Ming Yin and Yu Bai. ...
arXiv:2103.14077v3
fatcat:yxnrincc2jdzvjfcc2smdjj5xq
MOReL : Model-Based Offline Reinforcement Learning
[article]
2021
arXiv
pre-print
The learned P-MDP has the property that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the P-MDP. ...
Moreover, the modular design of MOReL enables future advances in its components (e.g. generative modeling, uncertainty estimation, planning etc.) to directly translate into advances for offline RL. ...
Emo Todorov for generously providing the MuJoCo simulator for use in this paper. Aravind Rajeswaran thanks Profs. Sham Kakade and Emo Todorov for valuable discussions. ...
arXiv:2005.05951v3
fatcat:avug6wvyebdcrcf6u4tl3ts4yi
Robust Markov Decision Processes
2013
Mathematics of Operations Research
Afterwards, we determine a policy that attains the highest worst-case performance over this confidence region. ...
Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments. ...
Acknowledgments The authors wish to express their gratitude to the referees for their constructive criticism which led to substantial improvements of the paper. ...
doi:10.1287/moor.1120.0566
fatcat:s32nuutu6fbpbpahrujmqazwb4
The Curse of Passive Data Collection in Batch Reinforcement Learning
[article]
2022
arXiv
pre-print
For example, when learning in episodic finite state-action Markov decision processes (MDPs) with S states and A actions, we show that even with the best (but passively chosen) logging policy, Ω(A^min(S ...
A remarkable feature of our result is the sharp characterization of the exponent that appears, which is critical for understanding what makes passive learning hard. ...
Chenjun Xiao and Bo Dai would like to thank O r Nachum for providing feedback on a draft of this manuscript. Ilbin Lee is supported by Discovery Grant from NSERC. ...
arXiv:2106.09973v2
fatcat:hhfxjvkwuvb2xctrznit6pzuju
« Previous
Showing results 1 — 15 out of 1,122 results