Filters








49 Hits in 4.8 sec

Provably Safe PAC-MDP Exploration Using Analogies [article]

Melrose Roderick, Vaishnavh Nagarajan, J. Zico Kolter
<span title="2021-03-22">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics.  ...  Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense.  ...  Analogous Safe-state Exploration Given these assumptions, we now detail the main algorithmic contribution of the paper, the Analogous Safe-state Exploration (ASE) algorithm, which we later prove is Safe-PAC-MDP  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2007.03574v2">arXiv:2007.03574v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/dabgchcckfeatpn77u42tre5da">fatcat:dabgchcckfeatpn77u42tre5da</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200824222533/https://arxiv.org/pdf/2007.03574v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/eb/cb/ebcb3bbabf0cdd0707c51d2956ff9ccac505cd67.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2007.03574v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Lipschitz Lifelong Reinforcement Learning [article]

Erwan Lecarpentier, David Abel, Kavosh Asadi, Yuu Jinnai, Emmanuel Rachelson, Michael L. Littman
<span title="2021-03-22">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
These theoretical results lead us to a value-transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate.  ...  We introduce a novel metric between Markov Decision Processes (MDPs) and establish that close MDPs have close optimal value functions.  ...  Improving PAC Exploration using the Median of Means. In Advances in Neural Information Processing Systems 29 (NeurIPS 2016), 3898-3906. Pineau, J. 2019.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2001.05411v3">arXiv:2001.05411v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/l444dmmnhzbppaeboxygv52igm">fatcat:l444dmmnhzbppaeboxygv52igm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210330165556/https://arxiv.org/pdf/2001.05411v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/6b/58/6b58a3251299044a11c43628f9cbe0e73c947da1.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2001.05411v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Provably Efficient Q-Learning with Low Switching Cost [article]

Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang
<span title="2020-02-10">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization.  ...  Our main contribution, Q-Learning with UCB2 exploration, is a model-free algorithm for H-step episodic MDP that achieves sublinear regret whose local switching cost in K episodes is O(H^3SAlog K), and  ...  B.2 Proof of Theorem 3 We first present the analogs of Lemmas that we used in the proof of Theorem 2.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1905.12849v3">arXiv:1905.12849v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/hhozgxuok5gznidnnuccegeuyq">fatcat:hhozgxuok5gznidnnuccegeuyq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200321114734/https://arxiv.org/pdf/1905.12849v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1905.12849v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

A Survey of Exploration Methods in Reinforcement Learning [article]

Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup
<span title="2021-09-02">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this article, we provide a survey of modern exploration methods in (Sequential) reinforcement learning, as well as a taxonomy of exploration methods.  ...  Exploration is an essential component of reinforcement learning algorithms, where agents need to learn how to predict and control unknown and often stochastic environments.  ...  PAC-MDP.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2109.00157v2">arXiv:2109.00157v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/dlqhzwxscnfbxpt2i6rp7ovp6i">fatcat:dlqhzwxscnfbxpt2i6rp7ovp6i</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210905105619/https://arxiv.org/pdf/2109.00157v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/3b/e8/3be84e24b144541a8cd9030526ef2b8ef2cbfe54.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2109.00157v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL [article]

Andrea Zanette
<span title="2021-06-19">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Often these applications require us to 1) identify a near optimal policy or to 2) estimate the value of a target policy.  ...  Several practical applications of reinforcement learning involve an agent learning from past data without the possibility of further exploration.  ...  Provably good batch reinforcement learning with- out great exploration. arXiv preprint arXiv:2007.08202, 2020. Marjani, A. A. and Proutiere, A.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.08005v4">arXiv:2012.08005v4</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/i5k6bmcxtfah7jl6wdmopyu5a4">fatcat:i5k6bmcxtfah7jl6wdmopyu5a4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210625064039/https://arxiv.org/pdf/2012.08005v4.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/c4/f7/c4f7a113ddf7043a7f06d5ea12d9b3f7b1f56498.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.08005v4" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Sequential Transfer in Reinforcement Learning with a Generative Model [article]

Andrea Tirinzoni, Riccardo Poiani, Marcello Restelli
<span title="2020-07-01">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge.  ...  We are interested in how to design reinforcement learning agents that provably reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.  ...  Alternatively, we could use the information measure I s,a as an exploration bonus (Jian et al., 2019) .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2007.00722v1">arXiv:2007.00722v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/r3jz4qk72feolg4llxqose3sky">fatcat:r3jz4qk72feolg4llxqose3sky</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200710060851/https://arxiv.org/pdf/2007.00722v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/04/89/0489c2f1fdc5c54dda0f1d9f5477526b5524327e.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2007.00722v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Of Cores: A Partial-Exploration Framework for Markov Decision Processes [article]

Jan Křetínský, Tobias Meggendorfer
<span title="2020-10-08">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
The main idea is to identify a "core" of an MDP, i.e., a subsystem where we provably remain with high probability, and to avoid computation on the less relevant rest of the state space.  ...  Although we identify the core using simulations and statistical techniques, it allows for rigorous error bounds in the analysis.  ...  Finite paths are defined analogously as elements of (S × A) * × S. We use ρ i and i to refer to the i-th state in the given (in)finite path.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1906.06931v6">arXiv:1906.06931v6</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mmxgps2vdrcbnhgzmoq5h3geye">fatcat:mmxgps2vdrcbnhgzmoq5h3geye</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201018234956/https://arxiv.org/pdf/1906.06931v6.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1906.06931v6" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Policy Certificates: Towards Accountable Reinforcement Learning [article]

Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill
<span title="2019-05-27">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
For tabular MDPs, we show that computing certificates can even improve the sample-efficiency of optimism-based exploration.  ...  The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration.  ...  Approaches on safe exploration (Kakade & Langford, 2002; Pirotta et al., 2013; Thomas et al., 2015a; Ghavamzadeh et al., 2016) guarantee monotonically increasing performance by operating in a batch loop  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1811.03056v3">arXiv:1811.03056v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/l7qefuzasvhstibggnwudsgnnq">fatcat:l7qefuzasvhstibggnwudsgnnq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200909115726/https://arxiv.org/pdf/1811.03056v2.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/01/62/01623afe89f80c1e94dbc5febae7c8c6c7467b38.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1811.03056v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning [article]

David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek
<span title="2019-12-03">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL.  ...  SU is highly effective on hard tabular exploration benchmarks.  ...  Binary tree MDP Network architecture We use a single neural network to obtain estimatesφ andψ. 1.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1810.06530v5">arXiv:1810.06530v5</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/v5hr3hie5ffkfpoctnc3qrc4eq">fatcat:v5hr3hie5ffkfpoctnc3qrc4eq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200912214042/https://arxiv.org/pdf/1810.06530v5.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/87/81/8781f459cdabdcc2b9550139ea2f65f77fe856df.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1810.06530v5" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Of Cores: A Partial-Exploration Framework for Markov Decision Processes

Jan Křetínský, Tobias Meggendorfer
<span title="2019-12-16">2019</span> <i title="episciences.org"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/c67srop7pbe3nciquoxzy3d6pm" style="color: black;">Logical Methods in Computer Science</a> </i> &nbsp;
The main idea is to identify a "core" of an MDP, i.e., a subsystem where we provably remain with high probability, and to avoid computation on the less relevant rest of the state space.  ...  Although we identify the core using simulations and statistical techniques, it allows for rigorous error bounds in the analysis.  ...  Finite paths are defined analogously as elements of (S × A) * × S. We use ρ i and i to refer to the i-th state in the given (in)finite path.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.23638/lmcs-16(4:3)2020">doi:10.23638/lmcs-16(4:3)2020</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ulhxksmv3zg7vhhc2gsb4iy5li">fatcat:ulhxksmv3zg7vhhc2gsb4iy5li</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210120042432/https://lmcs.episciences.org/6833/pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/0b/84/0b84fc004a6ab5e0f42f9038f16d925c092314ae.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.23638/lmcs-16(4:3)2020"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning [article]

Ming Yin, Yu Bai, Yu-Xiang Wang
<span title="2020-12-01">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
horizon, d_m is a quantity that reflects the exploration of the logging policy μ).  ...  Existing work on OPE mostly focus on evaluating a fixed target policy π, which does not provide useful bounds for offline policy learning as π will then be data-dependent.  ...  to sample complexity lower bounds in the pointwise bounded reward case; and Tengyang Xie and Nan Jiang for clarifying the scaling in the sample complexity of their results in (Xie & Jiang, 2020b) with us  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2007.03760v2">arXiv:2007.03760v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/yedxpdncnzdlfikvfzycfqlvi4">fatcat:yedxpdncnzdlfikvfzycfqlvi4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201204232541/https://arxiv.org/pdf/2007.03760v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/e9/5a/e95a1ffbd8c30aa5577483d344ca5ce057cff2cd.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2007.03760v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Reinforcement Learning in Reward-Mixing MDPs [article]

Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
<span title="2022-01-31">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We study the problem of learning a near optimal policy for two reward-mixing MDPs.  ...  In this work, we consider episodic reinforcement learning in a reward-mixing Markov decision process (MDP).  ...  More formally, we wish our algorithm to return an (ǫ, η) provably-approximately-correct (PAC) optimal policy, which we also refer as a near optimal policy, defined as follows: Definition 2 ((ǫ, η)-PAC  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2110.03743v2">arXiv:2110.03743v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/vmcdadqua5dkjiwqsbjb5es2qm">fatcat:vmcdadqua5dkjiwqsbjb5es2qm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220202114636/https://arxiv.org/pdf/2110.03743v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/29/6c/296c4b307cf5d8b05963de732be0814fba896f8f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2110.03743v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning [article]

Andrea Zanette, Martin J. Wainwright, Emma Brunskill
<span title="2021-08-19">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically.  ...  The algorithm can operate when the Bellman evaluation operator is closed with respect to the action value function of the actor's policies; this is a more general setting than the low-rank MDP model.  ...  present computationally intractable algorithms with the exception of [ZLKB20b] for a PAC setting with low inherent Bellman error which however requires an additional "explorability" condition.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2108.08812v1">arXiv:2108.08812v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/jdj2xw5v7nd7voavpy62bhue5e">fatcat:jdj2xw5v7nd7voavpy62bhue5e</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210825034701/https://arxiv.org/pdf/2108.08812v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/1e/00/1e00e8760472dab0fe1632a04a037d52d227d3a6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2108.08812v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Constrained episodic reinforcement learning in concave-convex and knapsack settings [article]

Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
<span title="2021-06-06">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Acknowledgments and Disclosure of Funding The authors would like to thank Rob Schapire for useful discussions that helped in the initial stages of this work and Yufeng Zhang whose careful reading of our  ...  Provably efficient safe exploration via primal-dual policy optimization. arXiv preprint arXiv:2003.00534.Efroni, Y., Mannor, S., and Pirotta, M. (2020).  ...  Benchmarking safe exploration in deep reinforcement learning. https://cdn.openai.com/safexp-short.pdf. Accessed March 11, 2020.Rosenberg, A. and Mansour, Y. (2019).  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2006.05051v2">arXiv:2006.05051v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/fyfgnnufgvge3gpvjzhhacozpa">fatcat:fyfgnnufgvge3gpvjzhhacozpa</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210611223520/https://arxiv.org/pdf/2006.05051v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/7e/d5/7ed50e8bd030093b78ab1a38567e2d426240b157.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2006.05051v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism [article]

Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell
<span title="2021-03-22">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We further show that LCB is almost adaptively optimal in MDPs.  ...  Based on the composition of the offline dataset, two main categories of methods are used: imitation learning which is suitable for expert datasets and vanilla offline RL which often requires uniform coverage  ...  Safe policy improvement with baseline bootstrapping. In International Conference on Machine Learning, pages 3652-3661. PMLR, 2019. Tor Lattimore and Marcus Hutter. PAC bounds for discounted MDPs.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.12021v1">arXiv:2103.12021v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/7wbhgdjr65gx7lme7gmf35txum">fatcat:7wbhgdjr65gx7lme7gmf35txum</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210325022232/https://arxiv.org/pdf/2103.12021v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/54/70/5470617f86feabc8d6589df59eab5e151621ec1a.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.12021v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 49 results