3,743 Hits in 5.1 sec

Going Beyond Linear RL: Sample Efficient Neural Function Approximation [article]

Baihe Huang and Kaixuan Huang and Sham M. Kakade and Jason D. Lee and Qi Lei and Runzhe Wang and Jiaqi Yang
2021 arXiv   pre-print
While the theory of RL has traditionally focused on linear function approximation (or eluder dimension) approaches, little is known about nonlinear RL with neural net approximations of the Q functions.  ...  Deep Reinforcement Learning (RL) powered by neural net approximation of the Q function has had enormous empirical success.  ...  Going Beyond Linear RL: Sample Efficient Neural Function Approximation arXiv:2107.06466v2 [cs.LG] 25 Dec  ... 
arXiv:2107.06466v2 fatcat:oqvs6lda5bfn5it5wohrwyr7yy

Instabilities of Offline RL with Pre-Trained Neural Representation [article]

Ruosong Wang, Yifan Wu, Ruslan Salakhutdinov, Sham M. Kakade
2021 arXiv   pre-print
In particular, our methodology explores these ideas when using features from pre-trained neural networks, in the hope that these representations are powerful enough to permit sample efficient offline RL  ...  The implications of these results, both from a theoretical and an empirical perspective, are that successful offline RL (where we seek to go beyond the low distribution shift regime) requires substantially  ...  ., V π ) approximately, using the collected dataset D, with as few samples as possible. Linear Function Approximation. In this paper, we focus on offline RL with linear function approximation.  ... 
arXiv:2103.04947v1 fatcat:k6ltnaqrlbdkvdc35ivqearqwa

Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation [article]

Junhong Shen, Lin F. Yang
2021 arXiv   pre-print
Recently, deep reinforcement learning (RL) has achieved remarkable empirical success by integrating deep neural networks into RL frameworks.  ...  To mitigate these issues, we propose a theoretically principled nearest neighbor (NN) function approximator that can improve the value networks in deep RL methods.  ...  However, as samples accumulate, a neural network with sufficient training data can outperform other function approximators due to its generalization ability.  ... 
arXiv:2110.04422v1 fatcat:zmhekvxiynfqrhcxapknk4lupa

What are the Statistical Limits of Offline RL with Linear Function Approximation? [article]

Ruosong Wang, Dean P. Foster, Sham M. Kakade
2020 arXiv   pre-print
The hope is that offline reinforcement learning coupled with function approximation methods (to deal with the curse of dimensionality) can provide a means to help alleviate the excessive sample complexity  ...  Perhaps surprisingly, our main result shows that even if: i) we have realizability in that the true value function of every policy is linear in a given set of features and 2) our off-policy data has good  ...  Acknowledgments The authors would like to thank Akshay Krishnamurthy, Alekh Agarwal, Wen Sun, and Nan Jiang for numerous helpful discussion on offline RL.  ... 
arXiv:2010.11895v1 fatcat:7adhepr6nfd5rkyeuurlmtcyfe

Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms [article]

Chi Jin, Qinghua Liu, Sobhan Miryoosefi
2021 arXiv   pre-print
Finding the minimal structural assumptions that empower sample-efficient learning is one of the most important research directions in Reinforcement Learning (RL).  ...  We show that the family of RL problems of low BE dimension is remarkably rich, which subsumes a vast majority of existing tractable RL problems including but not limited to tabular MDPs, linear MDPs, reactive  ...  Function approximation, especially based on deep neural networks, lies at the heart of the recent practical successes of RL in domains such as Atari (Mnih et al., 2013) , Go (Silver et al., 2016) , robotics  ... 
arXiv:2102.00815v4 fatcat:m3pqgwypb5bgdpmutugxlwkjky

The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces [article]

Chi Jin, Qinghua Liu, Tiancheng Yu
2021 arXiv   pre-print
Modern reinforcement learning (RL) commonly engages practical problems with large state spaces, where function approximation must be deployed to approximate either the value function or the policy.  ...  While recent progresses in RL theory address a rich set of RL problems with general function approximation, such successes are mostly restricted to the single-agent setting.  ...  While a recent line of works [24, 49, 56, 26, 16] significantly advance our understanding of RL with general function approximation, and provide sample-efficient guarantees for RL with kernels, neural  ... 
arXiv:2106.03352v2 fatcat:pqf7skwwqbfeziny4c25akedvq

Offline RL Without Off-Policy Evaluation [article]

David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna
2021 arXiv   pre-print
Most prior approaches to offline reinforcement learning (RL) have taken an iterative actor-critic approach involving off-policy evaluation.  ...  Introduction An important step towards effective real-world RL is to improve sample efficiency.  ...  While beyond the scope of this work, we do think that better offline model selection procedures will be crucial to make offline RL more broadly applicable.  ... 
arXiv:2106.08909v3 fatcat:3ipj5t6vhvagrpk4kudflekq6i

CEM-RL: Combining evolutionary and gradient-based methods for policy search [article]

Aloïs Pourchot, Olivier Sigaud
2019 arXiv   pre-print
By contrast, the latter is more sample efficient, but the most sample efficient variants are also rather unstable and highly sensitive to hyper-parameter setting.  ...  We show that cem-rl benefits from several advantages over its competitors and offers a satisfactory trade-off between performance and sample efficiency.  ...  efficient deep RL algorithms.  ... 
arXiv:1810.01222v3 fatcat:e7saewhrc5f3vj4vw2jsxjiyvu

Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL [article]

Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor
2018 arXiv   pre-print
On the one had, one of the biggest challenges for reinforcement learning is the sample efficiency (Yu 2018) .  ...  It is common to use one softmax output for the policy π(a t |s t ; θ) head and one linear output for the value function V (s t ; θ v ) head, with all non-output layers shared.  ...  After convolutional and dense layers, we used ELU activation functions. Neural network architectures were not tuned.  ... 
arXiv:1812.00045v1 fatcat:u7vv6pzp6rdqzcfqexutlpeqiu

Curious iLQR: Resolving Uncertainty in Model-based RL [article]

Sarah Bechtle, Yixin Lin, Akshara Rai, Ludovic Righetti, Franziska Meier
2019 arXiv   pre-print
Introduction Model-based reinforcement learning holds promise for sample-efficient learning on real robots [1] .  ...  The dynamics are linearized and the cost is quadratized along u n k , x n k in terms of state and control deviations δx k = x t − x n k , δu k = u k − u n k leading to the linear dynamics approximation  ...  Next, we briefly present the details of the risk-sensitive iLQR algorithm, following [7] , [26] and [6] . which becomes is: and making the following quadratic approximation of the value function Ψ:  ... 
arXiv:1904.06786v2 fatcat:hglaqdinhzf3tchhydokno7jyy

Long-Range Indoor Navigation with PRM-RL [article]

Anthony Francis and Aleksandra Faust and Hao-Tien Lewis Chiang and Jasmine Hsu and J. Chase Kew and Marek Fiser and Tsang-Wei Edward Lee
2020 arXiv   pre-print
Here we use Probabilistic Roadmaps (PRMs) as the sampling-based planner, and AutoRL as the reinforcement learning method in the indoor navigation context.  ...  We achieve this with PRM-RL, a hierarchical robot navigation method in which reinforcement learning agents that map noisy sensors to robot controls learn to solve short-range obstacle avoidance tasks,  ...  Sampling-based planners, such as Probabilistic Roadmaps (PRMs) [39] and Rapidly Exploring Random Trees (RRTs) [42] , [44] , plan efficiently by approximating the topology of the configuration space  ... 
arXiv:1902.09458v2 fatcat:x5jqwb3gjrdshi4z6zb3elfgym

Safety Augmented Value Estimation from Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks [article]

Brijen Thananjeyan, Ashwin Balakrishna, Ugo Rosolia, Felix Li, Rowan McAllister, Joseph E. Gonzalez, Sergey Levine, Francesco Borrelli, Ken Goldberg
2020 arXiv   pre-print
Results suggest that SAVED outperforms prior methods in terms of success rate, constraint satisfaction, and sample efficiency, making it feasible to safely learn a control policy directly on a real robot  ...  Reinforcement learning (RL) for robotics is challenging due to the difficulty in hand-engineering a dense cost function, which can lead to unintended behavior, and dynamical uncertainty, which makes exploration  ...  Results suggest that SAVED is more sample efficient and has higher success and constraint satisfaction rates than all RL baselines and can be efficiently and safely trained on a real robot.  ... 
arXiv:1905.13402v8 fatcat:vr3bdjy5r5alpnas7flvkpqdkq

Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L) [article]

Igor Halperin
2021 arXiv   pre-print
The suggested approach, dubbed 'SciPhy RL', thus reduces DOCTR-L to solving neural PDEs from data.  ...  A data-driven solution of the soft HJB equation uses methods of Neural PDEs and Physics-Informed Neural Networks developed in the field of Scientific Machine Learning (SciML).  ...  The latter problem can be estimated based on a moderate number of samples, giving rise to sample-efficient schemes.  ... 
arXiv:2104.01040v1 fatcat:ygqqtfagt5a23m57d5gpgcsbya

Flatland-RL : Multi-Agent Reinforcement Learning on Trains [article]

Sharada Mohanty, Erik Nygren, Florian Laurent, Manuel Schneider, Christian Scheller, Nilabha Bhattacharya, Jeremy Watson, Adrian Egli, Christian Eichenberger, Christian Baumberger, Gereon Vienken, Irene Sturm (+2 others)
2020 arXiv   pre-print
Efficient automated scheduling of trains remains a major challenge for modern railway systems.  ...  In order to probe the potential of Machine Learning (ML) research on Flatland, we (1) ran a first series of RL and IL experiments and (2) design and executed a public Benchmark at NeurIPS 2020 to engage  ...  For policy and value functions, we employed a small two layer neural network (each with 256 hidden units and ReLU activation).  ... 
arXiv:2012.05893v2 fatcat:au2esg6tgndj5dnqe4qvejvlym

TeachMyAgent: a Benchmark for Automatic Curriculum Learning in Deep RL [article]

Clément Romac, Rémy Portelas, Katja Hofmann, Pierre-Yves Oudeyer
2021 arXiv   pre-print
In parallel to improving DRL algorithms themselves, Automatic Curriculum Learning (ACL) study how teacher algorithms can train DRL agents more efficiently by adapting task selection to their evolving abilities  ...  This forces the task selection function to propose tasks where the agent performs well while progressively going towards the target task space.  ...  CASE STUDY: SAMPLE EFFICIENCY In this section, we take a look at the sample efficiency of the different ACL methods using their performance after only 5 millions steps.  ... 
arXiv:2103.09815v2 fatcat:qtsp6ghsgrd47fwgznnqdwrtsu
« Previous Showing results 1 — 15 out of 3,743 results