19 Hits in 5.8 sec

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs [article]

Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun
2020 arXiv   pre-print
Structurally, we make precise connections between these low rank MDPs and latent variable models, showing how they significantly generalize prior formulations for representation learning in RL.  ...  Algorithmically, we develop FLAMBE, which engages in exploration and representation learning for provably efficient RL in low rank transition models.  ...  We address the question of learning the representation φ in a low rank MDP. To this end our contributions are both structural and algorithmic. 1. Expressiveness of low rank MDPs.  ... 
arXiv:2006.10814v2 fatcat:stvbyny3prbrbddy7ye74gdnza

Representation Learning for Online and Offline RL in Low-rank MDPs [article]

Masatoshi Uehara, Xuezhou Zhang, Wen Sun
2022 arXiv   pre-print
Unlike prior works that assume the representation is known (e.g., linear MDPs), here we need to learn the representation for the low-rank MDP. We study both the online RL and offline RL settings.  ...  For the online setting, operating with the same computational oracles used in FLAMBE (Agarwal, the state-of-art algorithm for learning representations in low-rank MDPs, we propose an algorithm REP-UCB  ...  Acknowledgement The authors would like to thank Alekh Agarwal, Praneeth Netrapalli and Ming Yin for valuable feedback.  ... 
arXiv:2110.04652v3 fatcat:uqexoogxkbgjboja4ik5cj3mgy

Provable Benefits of Representational Transfer in Reinforcement Learning [article]

Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang
2022 arXiv   pre-print
The sample complexity is close to knowing the ground truth features in the target task, and comparable to prior representation learning results in the source tasks.  ...  We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy  ...  Acknowledgement We thank Masatoshi Uehara for the insightful discussions at the early stage of this project.  ... 
arXiv:2205.14571v1 fatcat:inuv5kbhnba7tc7fontmyemsjq

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach [article]

Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun
2022 arXiv   pre-print
We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics  ...  BRIEE interleaves latent states discovery, exploration, and exploitation together, and can provably learn a near-optimal policy with sample complexity scaling polynomially in the number of latent states  ...  The second equality is specific to the block structure of our features and does not hold in general low-rank MDPs.  ... 
arXiv:2202.00063v2 fatcat:a4tvd3cq7zb37evpyr242vz2le

Provably Efficient Representation Learning in Low-rank Markov Decision Processes [article]

Weitong Zhang and Jiafan He and Dongruo Zhou and Amy Zhang and Quanquan Gu
2021 arXiv   pre-print
In order to understand how representation learning can improve the efficiency of RL, we study representation learning for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel  ...  The success of deep reinforcement learning (DRL) is due to the power of learning a representation that is suitable for the underlying exploration and exploitation task.  ...  First, currently we are studying a special class (Yang and Wang, 2020) of low-rank MDP instead of the general low-rank MDP (Yang and Wang, 2019; Jin et al., 2020) .  ... 
arXiv:2106.11935v1 fatcat:tur44wmigrc3nkscfhoachbcxy

Model-free Representation Learning and Exploration in Low-rank MDPs [article]

Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal
2022 arXiv   pre-print
The low rank MDP has emerged as an important model for studying representation learning and exploration in reinforcement learning.  ...  In this work, we present the first model-free representation learning algorithms for low rank MDPs.  ...  Acknowledgements Part of this work was done while AM was at University of Michigan and was supported in part by a grant from the Open Philanthropy Project to the Center for Human-Compatible AI, and in  ... 
arXiv:2102.07035v2 fatcat:dizamv2qazarxggnttuwhr6pwu

Bilinear Classes: A Structural Framework for Provable Generalization in RL [article]

Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang
2021 arXiv   pre-print
This work introduces Bilinear Classes, a new structural framework, which permit generalization in reinforcement learning in a wide variety of settings through the use of function approximation.  ...  Furthermore, this framework also extends to the infinite dimensional (RKHS) setting: for the the Linear Q^*/V^* model, linear MDPs, and linear mixture MDPs, we provide sample complexities that have no  ...  We thank Akshay Krishnamurthy for a discussion regarding Q/V -Bellman rank.  ... 
arXiv:2103.10897v3 fatcat:wlr3tmthtjfnzhfxmefqts5f6e

Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage [article]

Masatoshi Uehara, Wen Sun
2021 arXiv   pre-print
Two notable examples are: (1) low-rank MDP with representation learning where the partial coverage condition is defined using a relative condition number measured by the unknown ground truth feature representation  ...  We then demonstrate that this algorithmic framework can be applied to many specialized Markov Decision Processes where additional structural assumptions can further refine the concept of partial coverage  ...  Flambe: Structural complexity and representation learning of low rank mdps. In Advances in Neural Information Processing Systems, volume 33, pages 20095-20107, 2020b.  ... 
arXiv:2107.06226v2 fatcat:samqjfn7crgmxaeqhf3yx3snbi

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [article]

Shuang Qiu, Lingxiao Wang, Chenjia Bai, Zhuoran Yang, Zhaoran Wang
2022 arXiv   pre-print
To narrow such a gap, we study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.  ...  For both models, we propose to extract the correct feature representations of the low-rank model by minimizing a contrastive loss.  ...  Agarwal, A., Kakade, S., Krishnamurthy, A., and Sun, W. Flambe: Structural complexity and representation learning of low rank mdps.arXivpreprintarXiv:2006.10814,2020.Moulin, H. and Vial, J.  ... 
arXiv:2207.14800v1 fatcat:eg555bavt5gb7ftahu4hhdbobi

Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL [article]

Ruiquan Huang, Jing Yang, Yingbin Liang
2022 arXiv   pre-print
We then particularize the SWEET framework to the tabular and the low-rank MDP settings, and develop algorithms coined Tabular-SWEET and Low-rank-SWEET, respectively.  ...  While the primary goal of the exploration phase in reward-free reinforcement learning (RF-RL) is to reduce the uncertainty in the estimated model with minimum number of trajectories, in practice, the agent  ...  Agarwal et al. (2020) studies low-rank MDP and proposes FLAMBE, whose learning objective can be translated to a reward-free learning goal with sample complexity Õ H 22 d 7 A 9 /ǫ 10 .  ... 
arXiv:2206.14057v1 fatcat:su7lakzgofaetnks56ajg7wthm

Robust Policy Gradient against Strong Data Corruption [article]

Xuezhou Zhang, Yiding Chen, Xiaojin Zhu, Wen Sun
2021 arXiv   pre-print
We study the problem of robust reinforcement learning under adversarial corruption on both rewards and transitions.  ...  Our attack model assumes an adaptive adversary who can arbitrarily corrupt the reward and transition at every step within an episode, for at most ϵ-fraction of the learning episodes.  ...  Flambe: Structural complexity and representation learn- ing of low rank mdps. Advances in Neural Information Processing Systems, 33, 2020b.  ... 
arXiv:2102.05800v3 fatcat:4xzl4rurpnflhbx3y2fqmhhsne

Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations [article]

Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan
2021 arXiv   pre-print
We provide an algorithm for this setting whose error is bounded in terms of the rank d of the underlying MDP.  ...  Specifically, our algorithm enjoys a sample complexity bound of O((H^4d K^3dlog |Π|)/ϵ^2) where H is the length of episodes, K is the number of actions and ϵ>0 is the desired sub-optimality.  ...  grant number 993/17) and the Yandex Initiative for Machine Learning at Tel Aviv University.  ... 
arXiv:2106.11519v1 fatcat:mh2pfxy62rfhfjf2z3prbutsje

Learning Bellman Complete Representations for Offline Policy Evaluation [article]

Jonathan D. Chang and Kaiwen Wang and Nathan Kallus and Wen Sun
2022 arXiv   pre-print
We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE).  ...  Our ablations show that both linear Bellman complete and coverage components of our method are crucial.  ...  We thank Rahul Kidambi, Ban Kawas, and the anonymous reviewers for useful discussions and feedback.  ... 
arXiv:2207.05837v1 fatcat:v5fqjvf26jeedc3nywcetqmc7q

Online Sparse Reinforcement Learning [article]

Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
2021 arXiv   pre-print
where N is the number of episodes and s is the sparsity level.  ...  We investigate the hardness of online reinforcement learning in fixed horizon, sparse linear Markov decision process (MDP), with a special focus on the high-dimensional regime where the ambient dimension  ...  Acknowledgement We greatly thank Akshay Krishnamurthy for pointing out the FLAMBE [Agarwal et al., 2020a] work and the role of action space size, and Yasin Abbasi-Yadkori for proofreading.  ... 
arXiv:2011.04018v4 fatcat:qeewsftdiveythbb4qq36y5e2i

The Statistical Complexity of Interactive Decision Making [article]

Dylan J. Foster and Sham M. Kakade and Jian Qian and Alexander Rakhlin
2022 arXiv   pre-print
the statistical complexity of learning.  ...  This question is analogous to the classical problem of optimal (supervised) statistical learning, where there are well-known complexity measures (e.g., VC dimension and Rademacher complexity) that govern  ...  Acknowledgements We thank Ayush Sekhari and Karthik Sridharan for helpful discussions, and thank Zak Mhammedi for useful comments and feedback.  ... 
arXiv:2112.13487v2 fatcat:d6ruf4l5xvd2log4odgubsagki
« Previous Showing results 1 — 15 out of 19 results