Filters








39 Hits in 4.8 sec

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits [article]

Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert E. Schapire
2014 arXiv   pre-print
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen  ...  By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes.  ...  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits  ... 
arXiv:1402.0555v2 fatcat:3ehdob473jh4fijtghdj2g7rpe

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability [article]

David Simchi-Levi, Yunzong Xu
2021 arXiv   pre-print
We design a fast and simple algorithm that achieves the statistically optimal regret with only ${O}(\log T)$ calls to an offline regression oracle across all $T$ rounds.  ...  This leads to faster algorithms and improved regret guarantees for broader classes of contextual bandit problems.  ...  The authors would like to express sincere gratitude to Alekh Agarwal, Dylan Foster, Akshay Krishnamurthy, John Langford, Menglong Li, Alexander Rakhlin, Yining Wang and Yunbei Xu for helpful comments and  ... 
arXiv:2003.12699v5 fatcat:ylydspt4d5b4vf2br7o5ugjewi

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination [article]

Dylan J. Foster, Akshay Krishnamurthy
2021 arXiv   pre-print
In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire asked whether first-order guarantees are even possible for contextual bandits and -- if so -- whether they can be attained  ...  We give a resolution to this question by providing an optimal and efficient reduction from contextual bandits to online regression with the logarithmic (or, cross-entropy) loss.  ...  We also thank Sasha Rakhlin for providing Google Cloud credits used to run the experiments. References Naoki Abe and Philip M Long.  ... 
arXiv:2107.02237v1 fatcat:xvw6eti3frdwpnf2tt4tqt7mn4

Meta-Learning for Contextual Bandit Exploration [article]

Amr Sharaf, Hal Daumé III
2019 arXiv   pre-print
We describe MELEE, a meta-learning algorithm for learning a good exploration policy in the interactive contextual bandit setting.  ...  MELEE addresses this trade-off by learning a good exploration strategy for offline tasks based on synthetic data, on which it can simulate the contextual bandit setting.  ...  Taming the monster: A fast and simple algorithm for contextual bandits.  ... 
arXiv:1901.08159v1 fatcat:xly3sqkvsrex3ms6rq5i4tleyu

Adapting to Misspecification in Contextual Bandits with Offline Regression Oracles [article]

Sanath Kumar Krishnamurthy, Vitor Hadad, Susan Athey
2021 arXiv   pre-print
We propose a simple family of contextual bandit algorithms that adapt to misspecification error by reverting to a good safe policy when there is evidence that misspecification is causing a regret increase  ...  Computationally efficient contextual bandits are often based on estimating a predictive model of rewards given contexts and arms using past data.  ...  Taming the monster: A fast and simple algo- rithm for contextual bandits. In International Conference on Machine Learning, pp. 1638-1646, 2014. Agrawal, S. and Goyal, N.  ... 
arXiv:2102.13240v2 fatcat:ho4raxl7xfcjnbw5n2crzlnqzy

Corralling a Band of Bandit Algorithms [article]

Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire
2017 arXiv   pre-print
Our results are applicable to many settings, such as multi-armed bandits, contextual bandits, and convex bandits. As examples, we present two main applications.  ...  We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the  ...  Acknowledgments The authors would like to thank John Langford for posing the question initially that stimulated this research.  ... 
arXiv:1612.06246v3 fatcat:eighvfwsfzajfozv4sigtwjudq

Model selection for contextual bandits [article]

Dylan J. Foster and Akshay Krishnamurthy and Haipeng Luo
2019 arXiv   pre-print
We introduce the problem of model selection for contextual bandits, where a learner must adapt to the complexity of the optimal policy while balancing exploration and exploitation.  ...  Our main result is a new model selection guarantee for linear contextual bandits.  ...  References Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, 2011.  ... 
arXiv:1906.00531v3 fatcat:x2vtebnh5jalbmya7ewvqlcgbe

Top-k eXtreme Contextual Bandits with Arm Hierarchy [article]

Rajat Sen, Alexander Rakhlin, Lexing Ying, Rahul Kidambi, Dean Foster, Daniel Hill, Inderjit Dhillon
2021 arXiv   pre-print
Motivated by modern applications, such as online advertisement and recommender systems, we study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous, and the  ...  We show that our algorithm has a regret guarantee of $O(k\sqrt{(A-k+1)T \log (|\mathcal{F}|T)})$, where $A$ is the total number of arms and $\mathcal{F}$ is the class containing the regression function  ...  Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. Taming the monster: A fast and simple algorithm for contextual bandits.  ... 
arXiv:2102.07800v1 fatcat:fkojyurufvhjphnie5m52qzcym

Adapting to Misspecification in Contextual Bandits [article]

Dylan J. Foster and Claudio Gentile and Mehryar Mohri and Julian Zimmert
2021 arXiv   pre-print
We introduce a new family of oracle-efficient algorithms for $\varepsilon$-misspecified contextual bandits that adapt to unknown model misspecification -- both for finite and infinite action settings.  ...  Specializing to linear contextual bandits with infinite actions in $d$ dimensions, we obtain the first algorithm that achieves the optimal $O(d\sqrt{T} + \varepsilon\sqrt{d}T)$ regret bound for unknown  ...  Acknowledgements DF acknowledges the support of NSF TRIPODS grant #1740751. We thank Teodor Marinov and Alexander Rakhlin for discussions on related topics.  ... 
arXiv:2107.05745v1 fatcat:aapvoy6xovh4nd5lizacrwr5ai

Efficient Algorithms for Adversarial Contextual Learning [article]

Vasilis Syrgkanis, Akshay Krishnamurthy, Robert E. Schapire
2016 arXiv   pre-print
We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem.  ...  In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies  ...  Agarwal, Alekh, Hsu, Daniel, Kale, Satyen, Langford, John, Li, Lihong, and Schapire, Robert E. Taming the monster: A fast and simple algorithm for contextual bandits.  ... 
arXiv:1602.02454v1 fatcat:dolh3jbruvevxiqdorophjlr7u

Neural Contextual Bandits with UCB-based Exploration [article]

Dongruo Zhou and Lihong Li and Quanquan Gu
2020 arXiv   pre-print
To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.  ...  We propose a new algorithm, NeuralUCB, which leverages the representation power of deep neural networks and uses a neural network-based random feature mapping to construct an upper confidence bound (UCB  ...  Acknowledgement We would like to thank the anonymous reviewers for their helpful comments. This research was sponsored in part by the National Science Foundation IIS-1904183 and IIS-1906169.  ... 
arXiv:1911.04462v3 fatcat:3u6erwajyfbnxpn3wvehrk6j3u

Stochastic Linear Contextual Bandits with Diverse Contexts [article]

Weiqiang Wu, Jing Yang, Cong Shen
2020 arXiv   pre-print
We design the LinUCB-d algorithm, and propose a novel approach to analyze its regret performance.  ...  In this paper, we investigate the impact of context diversity on stochastic linear contextual bandits.  ...  Acknowledgements JY acknowledges the support from U.S. National Science Foundation under Grant ECCS-1650299.  ... 
arXiv:2003.02681v1 fatcat:e6qz5bqbqzf4hgzfoq4cli6lke

Learning Reductions that Really Work [article]

Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro
2015 arXiv   pre-print
We provide a summary of the mathematical and computational techniques that have enabled learning reductions to effectively address a wide class of problems, and show that this approach to solving machine  ...  References [1] Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert E. Schapire. Taming the monster: A fast and simple algorithm for contextual bandits.  ...  Contextual Bandit Learning In contextual bandit learning, a learning algorithm needs to be applied to exploration data to learn a policy for acting in the world.  ... 
arXiv:1502.02704v1 fatcat:6je6nyymifh47ca455ip3ryy5u

Online Algorithm for Unsupervised Sequential Selection with Contextual Information [article]

Arun Verma, Manjesh K. Hanawal, Csaba Szepesvári, Venkatesh Saligrama
2020 arXiv   pre-print
Under CWD, we propose an algorithm for the contextual USS problem and demonstrate that it has sub-linear regret. Experiments on synthetic and real datasets validate our algorithm.  ...  In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback  ...  Csaba Szepesvári gratefully acknowledges funding from the Canada CIFAR AI Chairs Program, Amii, and NSERC.  ... 
arXiv:2010.12353v1 fatcat:drhvr7sw7vaffafuixyo2nj2zu

Practical Contextual Bandits with Regression Oracles [article]

Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire
2018 arXiv   pre-print
A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded.  ...  Our algorithms leverage the availability of a regression oracle for the value-function class, a more realistic and reasonable oracle than the classification oracles over policies typically assumed by agnostic  ...  Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning, pp. 1638-1646, 2014. Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R.  ... 
arXiv:1803.01088v1 fatcat:f2nmxg3izvbifhr3m2fg72uftu
« Previous Showing results 1 — 15 out of 39 results