Filters








354 Hits in 2.7 sec

Impact of Representation Learning in Linear Bandits [article]

Jiaqi Yang, Wei Hu, Jason D. Lee, Simon S. Du
2021 arXiv   pre-print
We study how representation learning can improve the efficiency of bandit problems.  ...  Furthermore, we extend our algorithm to the infinite-action setting and obtain a corresponding regret bound which demonstrates the benefit of representation learning in certain regimes.  ...  Our Contributions We give the first rigorous characterization on the benefit of representation learning for multi-task linear bandits.  ... 
arXiv:2010.06531v2 fatcat:czojjeclprhsdfpnyd2whrjwvm

Near-optimal Representation Learning for Linear Bandits and Linear RL [article]

Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, Liwei Wang
2021 arXiv   pre-print
To the best of our knowledge, this is the first theoretical result that characterizes the benefits of multi-task representation learning for exploration in RL with function approximation.  ...  This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation.  ...  Learning with good feature representations Jiaqi Yang, Wei Hu, Jason D. Lee, and Simon S. Du. Provable benefits of representation learning in linear bandits, 2020.  ... 
arXiv:2102.04132v1 fatcat:xkp3fbqsorhsjde4x5pti4y7me

Collaborative Learning and Personalization in Multi-Agent Stochastic Linear Bandits [article]

Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran
2021 arXiv   pre-print
We consider the problem of minimizing regret in an N agent heterogeneous stochastic linear bandits framework, where the agents (users) are similar but not all identical.  ...  In the personalization framework, we introduce a natural algorithm where, the personal bandit instances are initialized with the estimates of the global average model.  ...  Learning Algorithm: We propose the Successive Clustering of Linear Bandits (SCLB) algorithm in Algorithm 1.  ... 
arXiv:2106.08902v1 fatcat:tdwhsu4yvbchzpiqwyfvujzxcy

Provably Efficient Representation Learning in Low-rank Markov Decision Processes [article]

Weitong Zhang and Jiafan He and Dongruo Zhou and Amy Zhang and Quanquan Gu
2021 arXiv   pre-print
However, existing provable reinforcement learning algorithms with linear function approximation often assume the feature representation is known and fixed.  ...  In order to understand how representation learning can improve the efficiency of RL, we study representation learning for a class of low-rank Markov Decision Processes (MDPs) where the transition kernel  ...  This explains the benefit of representation learning for reinforcement learning.  ... 
arXiv:2106.11935v1 fatcat:tur44wmigrc3nkscfhoachbcxy

Balanced Linear Contextual Bandits

Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We develop algorithms for contextual bandits with linear payoffs that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation  ...  We provide the first regret bound analyses for linear contextual bandits with balancing and show that our algorithms match the state of the art theoretical guarantees.  ...  This research is generously supported by ONR grant N00014-17-1-2131, by the Sloan Foundation, by the "Arvanitidis in Memory of William K.  ... 
doi:10.1609/aaai.v33i01.33013445 fatcat:cvzn2dxls5akzgdweu57qy6co4

Balanced Linear Contextual Bandits [article]

Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens
2018 arXiv   pre-print
We develop algorithms for contextual bandits with linear payoffs that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation  ...  We provide the first regret bound analyses for linear contextual bandits with balancing and show that our algorithms match the state of the art theoretical guarantees.  ...  This research is generously supported by ONR grant N00014-17-1-2131, by the Sloan Foundation, by the "Arvanitidis in Memory of William K.  ... 
arXiv:1812.06227v1 fatcat:fjvhmzl3kzb3zfpehdz65aipzu

Neural Contextual Bandits with UCB-based Exploration [article]

Dongruo Zhou and Lihong Li and Quanquan Gu
2020 arXiv   pre-print
We also show the algorithm is empirically competitive against representative baselines in a number of benchmarks.  ...  To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.  ...  The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agencies.  ... 
arXiv:1911.04462v3 fatcat:3u6erwajyfbnxpn3wvehrk6j3u

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [article]

Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang
2021 arXiv   pre-print
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit problems and two-layer neural network  ...  For the low-rank generalized linear bandit problem, we provide a minimax-optimal algorithm in the dimension, refuting both conjectures in [LMT21, JWWN19].  ...  Provable benefits of representation learning in linear bandits. arXiv preprint arXiv:2010.06531, 2020. [ZLKB20] Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, and Emma Brunskill.  ... 
arXiv:2107.04518v1 fatcat:4vrtlaz67zhg3gqvwktlscwtz4

Reinforcement learning with multi-fidelity simulators

Mark Cutler, Thomas J. Walsh, Jonathan P. How
2014 2014 IEEE International Conference on Robotics and Automation (ICRA)  
We present a framework for reinforcement learning (RL) in a scenario where multiple simulators are available with decreasing amounts of fidelity to the real-world learning scenario.  ...  reward representations.  ...  In addition, our theoretical results hold for a large class of representations such as linear and Gaussian-noise dynamics covered by the KWIK learning framework [7] .  ... 
doi:10.1109/icra.2014.6907423 dblp:conf/icra/CutlerWH14 fatcat:gococffz7nbornymdwgcc2fuq4

Randomized Value Functions via Posterior State-Abstraction Sampling [article]

Dilip Arumugam, Benjamin Van Roy
2021 arXiv   pre-print
In empirically validating our approach, we find that substantial performance gains lie in the multi-task setting where tasks share a common, low-dimensional representation.  ...  State abstraction has been an essential tool for dramatically improving the sample efficiency of reinforcement-learning algorithms.  ...  the benefits of φ in RL.  ... 
arXiv:2010.02383v2 fatcat:3he5uir2krcevihukp4hepk5xe

Generalization and Exploration via Randomized Value Functions [article]

Ian Osband, Benjamin Van Roy, Zheng Wen
2016 arXiv   pre-print
Further, we establish an upper bound on the expected regret of RLSVI that demonstrates near-optimality in a tabula rasa learning context.  ...  More broadly, our results suggest that randomized value functions offer a promising approach to tackling a critical challenge in reinforcement learning: synthesizing efficient exploration and effective  ...  Obviously, this algorithm aims to learn the myopic policy. In this subsection, we describe the linear contextual bandit algorithm.  ... 
arXiv:1402.0635v3 fatcat:aoqndidaz5gnbkg65punxvf4ge

Model Selection in Batch Policy Optimization [article]

Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai
2021 arXiv   pre-print
We formalize the problem in the contextual bandit setting with linear model classes by identifying three sources of error that any model selection algorithm should optimally trade-off in order to be competitive  ...  positive results available in supervised learning.  ...  Provably efficient representation learning in low-rank markov decision processes. arXiv preprint arXiv:2106.11935, 2021. [ZJ21] Siyuan Zhang and Nan Jiang.  ... 
arXiv:2112.12320v1 fatcat:thrtvpivf5gy5h3i7cff5kzmvu

Real-World Reinforcement Learning via Multifidelity Simulators

Mark Cutler, Thomas J. Walsh, Jonathan P. How
2015 IEEE Transactions on robotics  
The framework is designed to limit the number of samples used in each successively higherfidelity/cost simulator by allowing a learning agent to choose to run trajectories at the lowest level simulator  ...  We present a framework for efficient RL in a scenario where multiple simulators of a target task are available, each with varying levels of fidelity.  ...  Theoretical results in this paper are tied to the general KWIK framework and so apply not only to tabular models of environments, but also to a large class of representations such as linear and Gaussian-noise  ... 
doi:10.1109/tro.2015.2419431 fatcat:bmisjruwufhdlcfywvg5nnebgy

Deep Exploration via Bootstrapped DQN [article]

Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy
2016 arXiv   pre-print
We demonstrate these benefits in complex stochastic MDPs and in the large-scale Arcade Learning Environment.  ...  We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions.  ...  Surprisingly, RLSVI recovers state of the art guarantees in the setting with tabular basis functions, but its performance is crucially dependent upon a suitable linear representation of the value function  ... 
arXiv:1602.04621v3 fatcat:pfstw4ib3vebrennwffyqlfxuu

Learning Near Optimal Policies with Low Inherent Bellman Error [article]

Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill
2020 arXiv   pre-print
We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning under the notion of low inherent Bellman error, a condition normally employed to show  ...  Finally, the algorithm reduces to the celebrated LinUCB when H=1 but with a different choice of the exploration parameter that allows handling misspecified contextual linear bandits.  ...  Provably efficient reinforcement learning with linear function ap- proximation. In Conference on Learning Theory, 2020. Kolter, J. Z. The fixed points of off-policy td.  ... 
arXiv:2003.00153v3 fatcat:2kojpcgskra4hjvhv4hvdtlx3q
« Previous Showing results 1 — 15 out of 354 results