366 Hits in 2.7 sec

Efficient Kernel UCB for Contextual Bandits [article]

Houssam Zenati, Alberto Bietti, Eustache Diemert, Julien Mairal, Matthieu Martin, Pierre Gaillard
2022 arXiv   pre-print
In this paper, we tackle the computational efficiency of kernelized UCB algorithms in contextual bandits.  ...  While standard methods require a O(CT^3) complexity where T is the horizon and the constant C is related to optimizing the UCB rule, we propose an efficient contextual algorithm for large-scale problems  ...  Calandriello et al., 2019 (Calandriello et al., , 2020)) . 3 Warm-up: Kernel-UCB for Contextual Bandits In this section, we introduce stochastic contextual bandits with reward functions lying in a RKHS  ... 
arXiv:2202.05638v1 fatcat:i322q2e2kjbjjidjygqiltbi5u

An Empirical Study of Neural Kernel Bandits [article]

Michal Lisicki, Arash Afkanpour, Graham W. Taylor
2021 arXiv   pre-print
While in general contextual bandits commonly utilize Gaussian process (GP) predictive distributions for decision making, the most successful neural variants use only the last layer parameters in the derivation  ...  Neural bandits have enabled practitioners to operate efficiently on problems with non-linear reward functions.  ...  The authors thank the Canada Foundation for Innovation and Compute Canada for computing resources.  ... 
arXiv:2111.03543v1 fatcat:lmewgppoxzelxenjr2hk3itxym

Kernel-based Multi-Task Contextual Bandits in Cellular Network Configuration [article]

Xiaoxiao Wang, Xueying Guo, Jie Chuai, Zhitang Chen, Xin Liu
2019 arXiv   pre-print
To leverage such similarities, we propose kernel-based multi-BS contextual bandit algorithm based on multi-task learning.  ...  We present theoretical analysis of the proposed algorithm in terms of regret and multi-task-learning efficiency.  ...  Multi-Task Contextual Bandit We model the problem as multi-task contextual bandits. Now, we briefly introduce the classical bandit model and contextual bandit model.  ... 
arXiv:1811.10902v2 fatcat:jrxsyvwb3jdzdhbhd3z7uzshuu

Contextual Gaussian Process Bandit Optimization

Andreas Krause, Cheng Soon Ong
2011 Neural Information Processing Systems  
We show that by mixing and matching kernels for contexts and actions, CGP-UCB can handle a variety of practical applications.  ...  We further provide generic tools for deriving regret bounds when using such composite kernel functions.  ...  Acknowledgments The authors wish to thank Christian Widmer for providing the MHC data, as well as Daniel Golovin and Aleksandrs Slivkins for helpful discussions.  ... 
dblp:conf/nips/KrauseO11 fatcat:rkjxwm2xtza6bbznodzznyitq4

Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs [article]

Yibo Yang, Antoine Blanchard, Themistoklis Sapsis, Paris Perdikaris
2021 arXiv   pre-print
Finally, we provide a JAX library for efficient bandit optimization using Gaussian processes.  ...  We present a new type of acquisition functions for online decision making in multi-armed and contextual bandit problems with extreme payoffs.  ...  kernel [36] , or using automatic differentiation [43] for more general kernel choices.  ... 
arXiv:2102.10085v2 fatcat:3r2cyu5enrhrra6pl6adsx6gk4

Dynamic allocation optimization in A/B tests using classification-based preprocessing

Emmanuelle Claeys, Pierre Gancarski, Myriam Maumy-Bertrand, Hubert Wassner
2021 IEEE Transactions on Knowledge and Data Engineering  
One way to avoid this is to apply a bandit-based algorithm. Such an algorithm is able to automatically decides if a page should be chosen and applied more often than the other one.  ...  We present our new method that finds the best variation for homogenous groups in a short period of time.  ...  The kernelized stochastic contextual bandit Kernel-Ucb [45] provides a non linear modelization of the link reward function (like GLM-UCB). It use a reproducing kernel Hilbert space (RKHS).  ... 
doi:10.1109/tkde.2021.3076025 fatcat:l6ifydtzdfgyjfyupwhajtit7a

Finite-Time Analysis of Kernelised Contextual Bandits [article]

Michal Valko, Nathaniel Korda, Remi Munos, Ilias Flaounas, Nelo Cristianini
2013 arXiv   pre-print
Moreover, for the linear kernel, our regret bound matches the lower bound for contextual linear bandits.  ...  For contextual bandits, the related algorithm GP-UCB turns out to be a special case of our algorithm, and our finite-time analysis improves the regret bound of GP-UCB for the agnostic case, both in the  ...  A non-linear kernel function creates a kernelised UCB algorithm for a non-linear bandit.  ... 
arXiv:1309.6869v1 fatcat:xyplphw5jrfrlfzoiqxkytfiem

Bandits for Learning to Explain from Explanations [article]

Freya Behrens, Stefano Teso, Davide Mottin
2021 arXiv   pre-print
We introduce Explearn, an online algorithm that learns to jointly output predictions and explanations for those predictions. Explearn leverages Gaussian Processes (GP)-based contextual bandits.  ...  Second, Explearn builds on recent results in contextual bandits which guarantee convergence with high probability. Our initial experiments hint at the promise of the approach.  ...  Efficient solvers do exist for special cases, e.g., when the kernel is linear.  ... 
arXiv:2102.03815v1 fatcat:74as4c7qpfdobhjdl3qz34izs4

An Analysis of Reinforcement Learning for Malaria Control [article]

Ndivhuwo Makondo, Arinze Lawrence Folarin, Simphiwe Nhlahla Zitha, Sekou Lionel Remy
2021 arXiv   pre-print
The problem has been formulated as multi-armed bandits, contextual bandits and a Markov Decision Process in isolation.  ...  Previous work on policy learning for Malaria control has often formulated the problem as an optimization problem assuming the objective function and the search space have a specific structure.  ...  for sample efficiency. .1.3 GP-UCB and CGP-UCB As described in Section 4.1.3, GP-UCB is governed by the choice of β t (Eq. 21), the kernel function K for the GP (Eq. 17) and the Gaussian noise t on the  ... 
arXiv:2107.08988v1 fatcat:4n7bpmlc2bdh7nwaqxlkwttbcq

Multi-Task Learning for Contextual Bandits [article]

Aniket Anand Deshmukh, Urun Dogan, Clayton Scott
2017 arXiv   pre-print
In this work, we propose a multi-task learning framework for contextual bandit problems.  ...  Contextual bandits are a form of multi-armed bandit in which the agent has access to predictive side information (known as the context) for each arm at each time step, and have been used to model personalized  ...  There are some contextual bandit setups that incorporate multi-task learning.  ... 
arXiv:1705.08618v1 fatcat:p4v5iss7uvcvrnt7t2khry3anq

Interactive Submodular Bandit

Lin Chen, Andreas Krause, Amin Karbasi
2017 Neural Information Processing Systems  
More specifically, given a bounded-RKHS norm kernel over the context-action-payoff space that governs the smoothness of the utility function, SM-UCB keeps an upperconfidence bound on the payoff function  ...  We develop SM-UCB that efficiently trades off exploration (collecting more data) and exploration (proposing a good action given gathered data) and achieves a O( √ T ) regret bound after T rounds of interaction  ...  In the literature, there are many variants of the multi-armed bandit problem and corresponding solutions, for example, EXP3 algorithm for adversarial bandits [3] , LINUCB for stochastic contextual bandits  ... 
dblp:conf/nips/00030K17 fatcat:y5aguvdf6rbfhnqvotdifhfu74

Gradient Perturbation is Underrated for Differentially Private Convex Optimization

Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
Gradient perturbation, widely used for differentially private optimization, injects noise at every iterative update to guarantee differential privacy.  ...  We show that for differentially private convex optimization, the utility guarantee of differentially private (stochastic) gradient descent is determined by an expected curvature rather than the minimum  ...  As for the nonlinear contextual bandits, [Valko et al., 2013] and [Deshmukh et al., 2017] both propose kernelized contextual bandit as a nonlinear version of Lin-UCB by finding linear members in RHKS  ... 
doi:10.24963/ijcai.2020/427 dblp:conf/ijcai/YangYR20 fatcat:dohvvzu2vbcirjkc2tppwmegt4

Contextual Games: Multi-Agent Learning with Side Information

Pier Giuseppe Sessa, Ilija Bogunovic, Andreas Krause, Maryam Kamgarpour
2020 Neural Information Processing Systems  
We define game-theoretic notions of contextual Coarse Correlated Equilibria (c-CCE) and optimal contextual welfare for this new class of games and show that c-CCEs and optimal welfare can be approached  ...  By means of kernel-based regularity assumptions, we model the correlation between different contexts and game outcomes and propose a novel online (meta) algorithm that exploits such correlations to minimize  ...  Game Equilibria and Efficiency In this section, we introduce new notions of equilibria and efficiency for contextual games.  ... 
dblp:conf/nips/SessaB0K20 fatcat:3phzmecfqbhgnlyyinfmkz73cy

Stochastic Bandits with Context Distributions [article]

Johannes Kirschner, Andreas Krause
2019 arXiv   pre-print
We leverage the UCB algorithm to this setting and show that it achieves an order-optimal high-probability bound on the cumulative regret for linear and kernelized reward functions.  ...  We introduce a stochastic contextual bandit model where at each time step the environment chooses a distribution over a context set and samples the context from this distribution.  ...  Acknowledgments The authors thank Agroscope for providing the crop yield data set, in particular Didier Pellet, Lilia Levy and Juan Herrera, who collected the winter wheat data, and Annelie Holzkämper,  ... 
arXiv:1906.02685v2 fatcat:kq7555kdqzbvjewkoucvsjiise

Misspecified Gaussian Process Bandit Optimization [article]

Ilija Bogunovic, Andreas Krause
2021 arXiv   pre-print
We consider the problem of optimizing a black-box function based on noisy bandit feedback. Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem.  ...  In addition, in a stochastic contextual setting, we show that EC-GP-UCB can be effectively combined with the regret bound balancing strategy and attain similar regret bounds despite not knowing ϵ.  ...  Moreover, when the used kernel is linear, we recover the same misspecification regret rate of [24] , i.e., Õ(ǫT √ d). 4 Algorithm for the contextual misspecified kernelized bandit setting In this section  ... 
arXiv:2111.05008v1 fatcat:2rvvzi3iejcq3funb5onzxwuy4
« Previous Showing results 1 — 15 out of 366 results