A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
[article]
2020
arXiv
pre-print
Structurally, we make precise connections between these low rank MDPs and latent variable models, showing how they significantly generalize prior formulations for representation learning in RL. ...
Algorithmically, we develop FLAMBE, which engages in exploration and representation learning for provably efficient RL in low rank transition models. ...
We address the question of learning the representation φ in a low rank MDP. To this end our contributions are both structural and algorithmic. 1. Expressiveness of low rank MDPs. ...
arXiv:2006.10814v2
fatcat:stvbyny3prbrbddy7ye74gdnza
Representation Learning for Online and Offline RL in Low-rank MDPs
[article]
2022
arXiv
pre-print
Unlike prior works that assume the representation is known (e.g., linear MDPs), here we need to learn the representation for the low-rank MDP. We study both the online RL and offline RL settings. ...
For the online setting, operating with the same computational oracles used in FLAMBE (Agarwal et.al), the state-of-art algorithm for learning representations in low-rank MDPs, we propose an algorithm REP-UCB ...
Acknowledgement The authors would like to thank Alekh Agarwal, Praneeth Netrapalli and Ming Yin for valuable feedback. ...
arXiv:2110.04652v3
fatcat:uqexoogxkbgjboja4ik5cj3mgy
Provable Benefits of Representational Transfer in Reinforcement Learning
[article]
2022
arXiv
pre-print
The sample complexity is close to knowing the ground truth features in the target task, and comparable to prior representation learning results in the source tasks. ...
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy ...
Acknowledgement We thank Masatoshi Uehara for the insightful discussions at the early stage of this project. ...
arXiv:2205.14571v1
fatcat:inuv5kbhnba7tc7fontmyemsjq
Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
[article]
2022
arXiv
pre-print
We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics ...
BRIEE interleaves latent states discovery, exploration, and exploitation together, and can provably learn a near-optimal policy with sample complexity scaling polynomially in the number of latent states ...
The second equality is specific to the block structure of our features and does not hold in general low-rank MDPs. ...
arXiv:2202.00063v2
fatcat:a4tvd3cq7zb37evpyr242vz2le
Provably Efficient Representation Learning in Low-rank Markov Decision Processes
[article]
2021
arXiv
pre-print
In order to understand how representation learning can improve the efficiency of RL, we study representation learning for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel ...
The success of deep reinforcement learning (DRL) is due to the power of learning a representation that is suitable for the underlying exploration and exploitation task. ...
First, currently we are studying a special class (Yang and Wang, 2020) of low-rank MDP instead of the general low-rank MDP (Yang and Wang, 2019; Jin et al., 2020) . ...
arXiv:2106.11935v1
fatcat:tur44wmigrc3nkscfhoachbcxy
Model-free Representation Learning and Exploration in Low-rank MDPs
[article]
2022
arXiv
pre-print
The low rank MDP has emerged as an important model for studying representation learning and exploration in reinforcement learning. ...
In this work, we present the first model-free representation learning algorithms for low rank MDPs. ...
Acknowledgements Part of this work was done while AM was at University of Michigan and was supported in part by a grant from the Open Philanthropy Project to the Center for Human-Compatible AI, and in ...
arXiv:2102.07035v2
fatcat:dizamv2qazarxggnttuwhr6pwu
Bilinear Classes: A Structural Framework for Provable Generalization in RL
[article]
2021
arXiv
pre-print
This work introduces Bilinear Classes, a new structural framework, which permit generalization in reinforcement learning in a wide variety of settings through the use of function approximation. ...
Furthermore, this framework also extends to the infinite dimensional (RKHS) setting: for the the Linear Q^*/V^* model, linear MDPs, and linear mixture MDPs, we provide sample complexities that have no ...
We thank Akshay Krishnamurthy for a discussion regarding Q/V -Bellman rank. ...
arXiv:2103.10897v3
fatcat:wlr3tmthtjfnzhfxmefqts5f6e
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage
[article]
2021
arXiv
pre-print
Two notable examples are: (1) low-rank MDP with representation learning where the partial coverage condition is defined using a relative condition number measured by the unknown ground truth feature representation ...
We then demonstrate that this algorithmic framework can be applied to many specialized Markov Decision Processes where additional structural assumptions can further refine the concept of partial coverage ...
Flambe: Structural complexity and representation learning of low rank mdps. In Advances in Neural Information Processing Systems, volume 33, pages 20095-20107, 2020b. ...
arXiv:2107.06226v2
fatcat:samqjfn7crgmxaeqhf3yx3snbi
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning
[article]
2022
arXiv
pre-print
To narrow such a gap, we study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions. ...
For both models, we propose to extract the correct feature representations of the low-rank model by minimizing a contrastive loss. ...
Agarwal, A., Kakade, S., Krishnamurthy, A., and Sun, W. Flambe: Structural complexity and representation learning of low rank mdps.arXivpreprintarXiv:2006.10814,2020.Moulin, H. and Vial, J. ...
arXiv:2207.14800v1
fatcat:eg555bavt5gb7ftahu4hhdbobi
Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL
[article]
2022
arXiv
pre-print
We then particularize the SWEET framework to the tabular and the low-rank MDP settings, and develop algorithms coined Tabular-SWEET and Low-rank-SWEET, respectively. ...
While the primary goal of the exploration phase in reward-free reinforcement learning (RF-RL) is to reduce the uncertainty in the estimated model with minimum number of trajectories, in practice, the agent ...
Agarwal et al. (2020) studies low-rank MDP and proposes FLAMBE, whose learning objective can be translated to a reward-free learning goal with sample complexity Õ H 22 d 7 A 9 /ǫ 10 . ...
arXiv:2206.14057v1
fatcat:su7lakzgofaetnks56ajg7wthm
Robust Policy Gradient against Strong Data Corruption
[article]
2021
arXiv
pre-print
We study the problem of robust reinforcement learning under adversarial corruption on both rewards and transitions. ...
Our attack model assumes an adaptive adversary who can arbitrarily corrupt the reward and transition at every step within an episode, for at most ϵ-fraction of the learning episodes. ...
Flambe: Structural complexity and representation learn-
ing of low rank mdps. Advances in Neural Information
Processing Systems, 33, 2020b. ...
arXiv:2102.05800v3
fatcat:4xzl4rurpnflhbx3y2fqmhhsne
Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations
[article]
2021
arXiv
pre-print
We provide an algorithm for this setting whose error is bounded in terms of the rank d of the underlying MDP. ...
Specifically, our algorithm enjoys a sample complexity bound of O((H^4d K^3dlog |Π|)/ϵ^2) where H is the length of episodes, K is the number of actions and ϵ>0 is the desired sub-optimality. ...
grant number 993/17) and the Yandex Initiative for Machine Learning at Tel Aviv University. ...
arXiv:2106.11519v1
fatcat:mh2pfxy62rfhfjf2z3prbutsje
Learning Bellman Complete Representations for Offline Policy Evaluation
[article]
2022
arXiv
pre-print
We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE). ...
Our ablations show that both linear Bellman complete and coverage components of our method are crucial. ...
We thank Rahul Kidambi, Ban Kawas, and the anonymous reviewers for useful discussions and feedback. ...
arXiv:2207.05837v1
fatcat:v5fqjvf26jeedc3nywcetqmc7q
Online Sparse Reinforcement Learning
[article]
2021
arXiv
pre-print
where N is the number of episodes and s is the sparsity level. ...
We investigate the hardness of online reinforcement learning in fixed horizon, sparse linear Markov decision process (MDP), with a special focus on the high-dimensional regime where the ambient dimension ...
Acknowledgement We greatly thank Akshay Krishnamurthy for pointing out the FLAMBE [Agarwal et al., 2020a] work and the role of action space size, and Yasin Abbasi-Yadkori for proofreading. ...
arXiv:2011.04018v4
fatcat:qeewsftdiveythbb4qq36y5e2i
The Statistical Complexity of Interactive Decision Making
[article]
2022
arXiv
pre-print
the statistical complexity of learning. ...
This question is analogous to the classical problem of optimal (supervised) statistical learning, where there are well-known complexity measures (e.g., VC dimension and Rademacher complexity) that govern ...
Acknowledgements We thank Ayush Sekhari and Karthik Sridharan for helpful discussions, and thank Zak Mhammedi for useful comments and feedback. ...
arXiv:2112.13487v2
fatcat:d6ruf4l5xvd2log4odgubsagki
« Previous
Showing results 1 — 15 out of 19 results