A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Tuning Bandit Algorithms in Stochastic Environments
[chapter]
2007
Lecture Notes in Computer Science
In this paper we consider a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. ...
In earlier experimental works, such algorithms were found to outperform the competing algorithms. ...
Introduction and notations In this paper we consider stochastic multi-armed bandit problems. ...
doi:10.1007/978-3-540-75225-7_15
fatcat:rwgpwxjq3vaynn4fqpptwfbcqq
Bridging Computational Neuroscience and Machine Learning on Non-Stationary Multi-Armed Bandits
[article]
2017
bioRxiv
pre-print
stochastic multi-armed bandit tasks. ...
Fast adaptation to changes in the environment requires both natural and artificial agents to be able to dynamically tune an exploration-exploitation trade-off during learning. ...
We tested the performance of several alternative algorithms on a set of different non-stationary bandit setups where we variated the most crucial components of a stochastic and changing environment. ...
doi:10.1101/117598
fatcat:qb6qicn46ffuhf7dp42kirhqd4
Lifelong Learning in Multi-Armed Bandits
[article]
2020
arXiv
pre-print
We propose a bandit over bandit approach with greedy algorithms and we perform extensive experimental evaluations in both stationary and non-stationary environments. ...
We specifically focus on confidence interval tuning of UCB algorithms. ...
Acknowledgments and Disclosure of Funding The research presented was supported by the French National Research Agency, under the project BOLD (ANR19-CE23-0026-04) and it was also supported in part by a ...
arXiv:2012.14264v1
fatcat:rycomukpefcurg6ccipuu2esqy
Learning to Optimize under Non-Stationarity
[article]
2021
arXiv
pre-print
It captures natural applications such as dynamic pricing and ads allocation in a changing environment. ...
We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. ...
In this paper, we design and analyze novel algorithms for the linear bandit problem in a drifting environment. ...
arXiv:1810.03024v6
fatcat:x56ge6hra5hynnmtymq5gyxqka
Synergies between Evolutionary Algorithms and Reinforcement Learning
2015
Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference - GECCO Companion '15
• Applied in tuning the parameters of EC algorithms II.2. ...
Multi-armed bandits (MAB) algorithms • Intuition for stochastic MAB • The algorithm starts by fairly exploring N-arms (= actions) • An agent selects between N-arms such that the expected reward over time ...
use model free RL or multi-armed bandits techniques for parameter tuning • Schemata theorem as initially used in association with multi-armed bandits by [Holland, 1975] • Variants of Monte Carlo Tree ...
doi:10.1145/2739482.2756582
dblp:conf/gecco/Drugan15
fatcat:5jedfs4jmfgclcpppzcjvz6yvu
A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit
[article]
2015
arXiv
pre-print
We first explore the traditional stochastic model of a multi-armed bandit, then explore a taxonomic scheme of complications to that model, for each complication relating it to a specific requirement or ...
Adaptive and sequential experiment design is a well-studied area in numerous domains. ...
[129] present Exp3.M as the best known algorithm for applying an adversarial approach to bandits in a multiple-plays environment. ...
arXiv:1510.00757v4
fatcat:eyxqdq3yl5fpdbv53wtnkfa25a
The Pareto Frontier of model selection for general Contextual Bandits
[article]
2021
arXiv
pre-print
Even in the purely stochastic regime, the desired results are unobtainable. ...
Under specific scrutiny has been model selection for general contextual bandits with nested policy classes, resulting in a COLT2020 open problem. ...
We thank Tor Lattimore for pointing us to the technicalities required for bounding the total variation of improper algorithms. ...
arXiv:2110.13282v1
fatcat:ozhsrm2rfna4rfrvhffmj44hb4
AutoML for Contextual Bandits
[article]
2022
arXiv
pre-print
Contextual Bandits is one of the widely popular techniques used in applications such as personalization, recommendation systems, mobile health, causal marketing etc . ...
We see that our model outperforms or performs comparatively to other models while requiring no tuning nor feature engineering. ...
intending it's a stochastic variable). ...
arXiv:1909.03212v2
fatcat:scixdaefijbupj4nw7wcdsesva
Semi-Parametric Sampling for Stochastic Bandits with Many Arms
2019
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
We consider the stochastic bandit problem with a large candidate arm set. ...
In this setting, classic multi-armed bandit algorithms, which assume independence among arms and adopt non-parametric reward model, are inefficient, due to the large number of arms. ...
Few works study the stochastic multi-armed bandit problem in semi-parametric environment. ...
doi:10.1609/aaai.v33i01.33017933
fatcat:zzhp67pai5debm46pov7mvio5e
Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax
[chapter]
2011
Lecture Notes in Computer Science
The method is evaluated in experiments having deterministic rewards and a mixture of both deterministic and stochastic rewards. ...
The results show that a VDBE-Softmax policy can outperform ε-greedy, Softmax and VDBE policies in combination with on-and off-policy learning algorithms such as Q-learning and Sarsa. ...
In addition to the original multi-armed bandit problem [9] , the environment in the bandit world consists of multiple states and diverse bandits (in this example states B 1 and B 2 ). ...
doi:10.1007/978-3-642-24455-1_33
fatcat:eod4k25jnrdhrfix4ea67krjba
Bayesian Unification of Gradient and Bandit-Based Learning for Accelerated Global Optimisation
2016
2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)
Our empirical results demonstrate that by unifying bandit and gradient based learning, one obtains consistently improved performance across a wide spectrum of problem environments. ...
Due to the pervasiveness of bandit based optimisation, our scheme opens up for improved performance both in meta-optimisation and in applications where gradient related information is readily available ...
In contrast, bandit algorithms are designed for on-line operation, aiming to converge to the optimal arm (global optima) in as few trials as possible. ...
doi:10.1109/icmla.2016.0044
dblp:conf/icmla/Granmo16
fatcat:3ep5f5abnnho7awhdrgcchfjou
Corralling a Band of Bandit Algorithms
[article]
2017
arXiv
pre-print
We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the ...
The first is to create an algorithm that enjoys worst-case robustness while at the same time performing much better when the environment is relatively easy. ...
R(T ) α Environment ILOVETOCONBANDITS O( KT ln |Θ|) 1/2 stochastic contextual bandit BISTRO+ O((KT ) 2 3 (ln |Θ|) 1 3 ) 1/3 hybrid contextual bandit Epoch-Greedy O(T 2 3 K ln |Θ|) 1/3 stochastic contextual ...
arXiv:1612.06246v3
fatcat:eighvfwsfzajfozv4sigtwjudq
Data Poisoning Attacks on Stochastic Bandits
[article]
2019
arXiv
pre-print
In this paper, we propose a framework of offline attacks on bandit algorithms and study convex optimization based attacks on several popular bandit algorithms. ...
Stochastic multi-armed bandits form a class of online learning problems that have important applications in online recommendation systems, adaptive medical treatment, and many others. ...
ing that there is no robust and good stochastic bandit For each reward rt returned from the bandit environment,
algorithm that can survive online poisoning attacks. ...
arXiv:1905.06494v1
fatcat:uhmscq7d2fhmjjxse7zcm2raam
Bandit Algorithms for Precision Medicine
[article]
2021
arXiv
pre-print
details of bandit algorithms that have been used in mobile health. ...
Since these reviews were published, bandit algorithms have continued to find uses in mobile health and several new topics have emerged in the research on bandit algorithms. ...
In such scenarios, a best action that maximizes the total reward still exists, but algorithms designed for stochastic bandit environments are no longer guaranteed to work. ...
arXiv:2108.04782v1
fatcat:dni5wyzyerestgs3upuzz776n4
Active Reinforcement Learning: Observing Rewards at a Cost
[article]
2020
arXiv
pre-print
Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. ...
We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes, and discuss and illustrate some challenging aspects of the ARL problem. ...
The authors acknowledge the use of the University of Oxford Advanced Research Computing (ARC) facility in carrying out this work. ...
arXiv:2011.06709v2
fatcat:x5sce4tcavhrjjtqjybapmkeru
« Previous
Showing results 1 — 15 out of 2,068 results