2 Hits in 4.6 sec

Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature [article]

Kefan Dong, Jiaqi Yang, Tengyu Ma
2021 arXiv   pre-print
This paper studies model-based bandit and reinforcement learning (RL) with nonlinear function approximations.  ...  For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOlin), which provably converges to a local maximum with sample complexity that  ...  Lee, Colin Wei, Akshay Krishnamurthy and Alekh Agarwal for helpful discussions. TM is also partially supported by the Google Faculty Award, Lam Research, and  ... 
arXiv:2102.04168v4 fatcat:tmcgaeuddbbgjf26heqqf2kszy

Going Beyond Linear RL: Sample Efficient Neural Function Approximation [article]

Baihe Huang and Kaixuan Huang and Sham M. Kakade and Jason D. Lee and Qi Lei and Runzhe Wang and Jiaqi Yang
2021 arXiv   pre-print
Deep Reinforcement Learning (RL) powered by neural net approximation of the Q function has had enormous empirical success.  ...  Our first result is a computationally and statistically efficient algorithm in the generative model setting under completeness for two-layer neural networks.  ...  Provable model-based non- linear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature. arXiv preprint arXiv:2102.04168, 2021.  ... 
arXiv:2107.06466v2 fatcat:oqvs6lda5bfn5it5wohrwyr7yy