1,122 Hits in 4.9 sec

Improved Strong Worst-case Upper Bounds for MDP Planning

Anchit Gupta, Shivaram Kalyanakrishnan
2017 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence  
Specifically, we furnish improved STRONG WORST-CASE upper bounds on the running time of MDP planning.  ...  Our contributions are to this general case. For k >= 3, the tightest strong upper bound shown to date for MDP planning belongs to a family of algorithms called Policy Iteration.  ...  Table 1 : 1 Summary of tightest known strong upper bounds for MDP planning.  ... 
doi:10.24963/ijcai.2017/248 dblp:conf/ijcai/GuptaK17 fatcat:rqtv4lgsybhtpgkltz64ccatmq

Algorithms and Conditional Lower Bounds for Planning Problems

Krishnendu Chatterjee, Wolfgang Dvořák, Monika Henzinger, Alexander Svozil
2021 Artificial Intelligence  
We consider planning problems for graphs, Markov decision processes (MDPs), and games on graphs.  ...  For the coverage problem, we present a linear-time algorithm for graphs, and quadratic conditional lower bound for MDPs and games on graphs.  ...  Acknowledgments The authors are grateful to the anonymous referees for their valuable comments and suggestions to improve the presentation of the paper. A.  ... 
doi:10.1016/j.artint.2021.103499 fatcat:3wzfkogrhngmhkpsvjmivupdne

On the Complexity of Policy Iteration [article]

Yishay Mansour, Satinder Singh
2013 arXiv   pre-print
In this paper we prove the first such non-trivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy.  ...  Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs).  ...  For greedy policy iteration we proved an upper-bound of a e n ·), and for random policy iteration we proved an upper-bound of 0(2°·78n), both in the case that the MDP has two actions.  ... 
arXiv:1301.6718v1 fatcat:j7cw5flx5reazesmh5by37i5ie

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds [article]

Andrea Zanette, Emma Brunskill
2019 arXiv   pre-print
Strong worst-case performance bounds for episodic reinforcement learning exist but fortunately in practice RL algorithms perform much better than such bounds would predict.  ...  As a step towards this we derive an algorithm for finite horizon discrete MDPs and associated analysis that both yields state-of-the art worst-case regret bounds in the dominant terms and yields substantially  ...  Acknowledgments The authors are grateful for the high quality feedback of the reviewers, and for the comments of Yonathan Efroni and his colleagues, including Mohammad Ghavamzadeh and Shie Mannor, who  ... 
arXiv:1901.00210v4 fatcat:bzjjrwfgdfdk5kdk6lior7emuu

Algorithms and Conditional Lower Bounds for Planning Problems [article]

Krishnendu Chatterjee, Wolfgang Dvořák, Monika Henzinger and Alexander Svozil
2018 arXiv   pre-print
We consider planning problems for graphs, Markov decision processes (MDPs), and games on graphs.  ...  For the coverage problem, we present a linear-time algorithm for graphs and quadratic conditional lower bound for MDPs and games on graphs.  ...  Acknowledgments The authors are grateful to the anonymous referees for their valuable comments and suggestions to improve the presentation of the paper. A.  ... 
arXiv:1804.07031v1 fatcat:5suwpcwwdzgujj75xgfedcizle

On the Complexity of Solving Markov Decision Problems [article]

Michael L. Littman, Thomas L. Dean, Leslie Pack Kaelbling
2013 arXiv   pre-print
Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning.  ...  We argue that, although MDPs can be solved efficiently in theory, more study is needed to reveal practical algorithms for solving large problems quickly.  ...  Acknowledgments Thanks to Justin Boyan, Tony Cassandra, Anne Con don, Paul Dagum, Michael Jordan, Philip Klein, Hsueh-1 Lu, Walter Ludwig, Satinder Singh, John Tsitsiklis, and Marty Puterman for pointers  ... 
arXiv:1302.4971v1 fatcat:s77uwofrtfh3njhgcaqn7bwb4a

DESPOT: Online POMDP Planning with Regularization

Nan Ye, Adhiraj Somani, David Hsu, Wee Sun Lee
2017 The Journal of Artificial Intelligence Research  
Leveraging this result, we give an anytime online planning algorithm, which searches a DESPOT for a policy that optimizes a regularized objective function.  ...  The algorithm demonstrates strong experimental results, compared with some of the best online POMDP algorithms available.  ...  We are grateful to the anonymous reviewers for carefully reading the manuscript and providing many suggestions which helped greatly in improving the paper.  ... 
doi:10.1613/jair.5328 fatcat:rk7kxw64lzccfd5rvbo3enkjnu

Simple Regret Optimization in Online Planning for Markov Decision Processes

Z. Feldman, C. Domshlak
2014 The Journal of Artificial Intelligence Research  
We consider online planning in Markov decision processes (MDPs).  ...  To date, state-of-the-art algorithms for online planning in general MDPs are either best effort, or guarantee only polynomial-rate reduction of simple regret over time.  ...  However, when it comes to formal guarantees on the expected performance improvement over the planning time, none of the online MCTS algorithms for MDPs breaks the barrier of the worst-case polynomial-rate  ... 
doi:10.1613/jair.4432 fatcat:xxccofbxv5ea3bpb2aajzlrev4

Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

Dimitri Scheftelowitsch, Peter Buchholz, Vahid Hashemi, Holger Hermanns
2017 Proceedings of the 11th EAI International Conference on Performance Evaluation Methodologies and Tools - VALUETOOLS 2017  
In particular, the approach is defined for bounded-parameter Markov decision processes (BMDPs) [GLD00] . In this setting the worst, best and average case performance of a policy is analyzed.  ...  Markov decision processes (MDPs) are a well established model for planing under uncertainty.  ...  BMDPs define upper and lower bounds for the transition probabilities and rewards and allow one to analyze the worst and best case behavior.  ... 
doi:10.1145/3150928.3150945 dblp:conf/valuetools/Scheftelowitsch17 fatcat:ghcrcgrtdrbhtdixi2kumzpqei

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations [article]

Huan Zhang, Hongge Chen, Chaowei Xiao, Bo Li, Mingyan Liu, Duane Boning, Cho-Jui Hsieh
2021 arXiv   pre-print
We significantly improve the robustness of PPO, DDPG and DQN agents under a suite of strong white box adversarial attacks, including new attacks of our own.  ...  We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.  ...  , 15] , or does not significantly improve robustness under strong attacks.  ... 
arXiv:2003.08938v7 fatcat:64dqpsscovbfzm42rucdvbkvdy

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function [article]

Gellért Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári
2021 arXiv   pre-print
Whether the computation cost is similarly bounded remains an open question. We extend the upper bound to the near-realizable case and to the infinite-horizon discounted setup.  ...  The generative model provides a local access to the MDP: The planner can ask for random transitions from previously returned states and arbitrary actions, and features are only accessible for states that  ...  Acknowledgements We thank the anonymous reviewers for their helpful comments. This work was done while the authors were visiting the Simons Institute for the Theory of Computing.  ... 
arXiv:2102.02049v3 fatcat:nqmdydkd5rddje3s2ujijabjxa

Nearly Horizon-Free Offline Reinforcement Learning [article]

Tongzheng Ren, Jialian Li, Bo Dai, Simon S. Du, Sujay Sanghavi
2022 arXiv   pre-print
d_m, we obtain nearly horizon H-free sample complexity bounds for offline reinforcement learning when the total reward is upper bounded by 1.  ...  To the best of our knowledge, these are the first set of nearly horizon-free bounds for episodic time-homogeneous offline tabular MDP and linear MDP with anchor points.  ...  Acknowledgement The authors thank the anonymous reviewer for their constructive feedback. TR would like to thank for the helpful discussion with Ming Yin and Yu Bai.  ... 
arXiv:2103.14077v3 fatcat:yxnrincc2jdzvjfcc2smdjj5xq

MOReL : Model-Based Offline Reinforcement Learning [article]

Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims
2021 arXiv   pre-print
The learned P-MDP has the property that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the P-MDP.  ...  Moreover, the modular design of MOReL enables future advances in its components (e.g. generative modeling, uncertainty estimation, planning etc.) to directly translate into advances for offline RL.  ...  Emo Todorov for generously providing the MuJoCo simulator for use in this paper. Aravind Rajeswaran thanks Profs. Sham Kakade and Emo Todorov for valuable discussions.  ... 
arXiv:2005.05951v3 fatcat:avug6wvyebdcrcf6u4tl3ts4yi

Robust Markov Decision Processes

Wolfram Wiesemann, Daniel Kuhn, Berç Rustem
2013 Mathematics of Operations Research  
Afterwards, we determine a policy that attains the highest worst-case performance over this confidence region.  ...  Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments.  ...  Acknowledgments The authors wish to express their gratitude to the referees for their constructive criticism which led to substantial improvements of the paper.  ... 
doi:10.1287/moor.1120.0566 fatcat:s32nuutu6fbpbpahrujmqazwb4

The Curse of Passive Data Collection in Batch Reinforcement Learning [article]

Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, Csaba Szepesvari
2022 arXiv   pre-print
For example, when learning in episodic finite state-action Markov decision processes (MDPs) with S states and A actions, we show that even with the best (but passively chosen) logging policy, Ω(A^min(S  ...  A remarkable feature of our result is the sharp characterization of the exponent that appears, which is critical for understanding what makes passive learning hard.  ...  Chenjun Xiao and Bo Dai would like to thank O r Nachum for providing feedback on a draft of this manuscript. Ilbin Lee is supported by Discovery Grant from NSERC.  ... 
arXiv:2106.09973v2 fatcat:hhfxjvkwuvb2xctrznit6pzuju
« Previous Showing results 1 — 15 out of 1,122 results