Filters








2,068 Hits in 3.4 sec

Tuning Bandit Algorithms in Stochastic Environments [chapter]

Jean-Yves Audibert, Rémi Munos, Csaba Szepesvári
2007 Lecture Notes in Computer Science  
In this paper we consider a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms.  ...  In earlier experimental works, such algorithms were found to outperform the competing algorithms.  ...  Introduction and notations In this paper we consider stochastic multi-armed bandit problems.  ... 
doi:10.1007/978-3-540-75225-7_15 fatcat:rwgpwxjq3vaynn4fqpptwfbcqq

Bridging Computational Neuroscience and Machine Learning on Non-Stationary Multi-Armed Bandits [article]

George Velentzas, Costas Tzafestas, Mehdi Khamassi
2017 bioRxiv   pre-print
stochastic multi-armed bandit tasks.  ...  Fast adaptation to changes in the environment requires both natural and artificial agents to be able to dynamically tune an exploration-exploitation trade-off during learning.  ...  We tested the performance of several alternative algorithms on a set of different non-stationary bandit setups where we variated the most crucial components of a stochastic and changing environment.  ... 
doi:10.1101/117598 fatcat:qb6qicn46ffuhf7dp42kirhqd4

Lifelong Learning in Multi-Armed Bandits [article]

Matthieu Jedor, Jonathan Louëdec, Vianney Perchet
2020 arXiv   pre-print
We propose a bandit over bandit approach with greedy algorithms and we perform extensive experimental evaluations in both stationary and non-stationary environments.  ...  We specifically focus on confidence interval tuning of UCB algorithms.  ...  Acknowledgments and Disclosure of Funding The research presented was supported by the French National Research Agency, under the project BOLD (ANR19-CE23-0026-04) and it was also supported in part by a  ... 
arXiv:2012.14264v1 fatcat:rycomukpefcurg6ccipuu2esqy

Learning to Optimize under Non-Stationarity [article]

Wang Chi Cheung and David Simchi-Levi and Ruihao Zhu
2021 arXiv   pre-print
It captures natural applications such as dynamic pricing and ads allocation in a changing environment.  ...  We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting.  ...  In this paper, we design and analyze novel algorithms for the linear bandit problem in a drifting environment.  ... 
arXiv:1810.03024v6 fatcat:x56ge6hra5hynnmtymq5gyxqka

Synergies between Evolutionary Algorithms and Reinforcement Learning

Madalina M. Drugan
2015 Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference - GECCO Companion '15  
• Applied in tuning the parameters of EC algorithms II.2.  ...  Multi-armed bandits (MAB) algorithms • Intuition for stochastic MAB • The algorithm starts by fairly exploring N-arms (= actions) • An agent selects between N-arms such that the expected reward over time  ...  use model free RL or multi-armed bandits techniques for parameter tuning • Schemata theorem as initially used in association with multi-armed bandits by [Holland, 1975] • Variants of Monte Carlo Tree  ... 
doi:10.1145/2739482.2756582 dblp:conf/gecco/Drugan15 fatcat:5jedfs4jmfgclcpppzcjvz6yvu

A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit [article]

Giuseppe Burtini, Jason Loeppky, Ramon Lawrence
2015 arXiv   pre-print
We first explore the traditional stochastic model of a multi-armed bandit, then explore a taxonomic scheme of complications to that model, for each complication relating it to a specific requirement or  ...  Adaptive and sequential experiment design is a well-studied area in numerous domains.  ...  [129] present Exp3.M as the best known algorithm for applying an adversarial approach to bandits in a multiple-plays environment.  ... 
arXiv:1510.00757v4 fatcat:eyxqdq3yl5fpdbv53wtnkfa25a

The Pareto Frontier of model selection for general Contextual Bandits [article]

Teodor V. Marinov, Julian Zimmert
2021 arXiv   pre-print
Even in the purely stochastic regime, the desired results are unobtainable.  ...  Under specific scrutiny has been model selection for general contextual bandits with nested policy classes, resulting in a COLT2020 open problem.  ...  We thank Tor Lattimore for pointing us to the technicalities required for bounding the total variation of improper algorithms.  ... 
arXiv:2110.13282v1 fatcat:ozhsrm2rfna4rfrvhffmj44hb4

AutoML for Contextual Bandits [article]

Praneet Dutta, Joe Cheuk, Jonathan S Kim, Massimo Mascaro
2022 arXiv   pre-print
Contextual Bandits is one of the widely popular techniques used in applications such as personalization, recommendation systems, mobile health, causal marketing etc .  ...  We see that our model outperforms or performs comparatively to other models while requiring no tuning nor feature engineering.  ...  intending it's a stochastic variable).  ... 
arXiv:1909.03212v2 fatcat:scixdaefijbupj4nw7wcdsesva

Semi-Parametric Sampling for Stochastic Bandits with Many Arms

Mingdong Ou, Nan Li, Cheng Yang, Shenghuo Zhu, Rong Jin
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We consider the stochastic bandit problem with a large candidate arm set.  ...  In this setting, classic multi-armed bandit algorithms, which assume independence among arms and adopt non-parametric reward model, are inefficient, due to the large number of arms.  ...  Few works study the stochastic multi-armed bandit problem in semi-parametric environment.  ... 
doi:10.1609/aaai.v33i01.33017933 fatcat:zzhp67pai5debm46pov7mvio5e

Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax [chapter]

Michel Tokic, Günther Palm
2011 Lecture Notes in Computer Science  
The method is evaluated in experiments having deterministic rewards and a mixture of both deterministic and stochastic rewards.  ...  The results show that a VDBE-Softmax policy can outperform ε-greedy, Softmax and VDBE policies in combination with on-and off-policy learning algorithms such as Q-learning and Sarsa.  ...  In addition to the original multi-armed bandit problem [9] , the environment in the bandit world consists of multiple states and diverse bandits (in this example states B 1 and B 2 ).  ... 
doi:10.1007/978-3-642-24455-1_33 fatcat:eod4k25jnrdhrfix4ea67krjba

Bayesian Unification of Gradient and Bandit-Based Learning for Accelerated Global Optimisation

Ole-Christoffer Granmo
2016 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)  
Our empirical results demonstrate that by unifying bandit and gradient based learning, one obtains consistently improved performance across a wide spectrum of problem environments.  ...  Due to the pervasiveness of bandit based optimisation, our scheme opens up for improved performance both in meta-optimisation and in applications where gradient related information is readily available  ...  In contrast, bandit algorithms are designed for on-line operation, aiming to converge to the optimal arm (global optima) in as few trials as possible.  ... 
doi:10.1109/icmla.2016.0044 dblp:conf/icmla/Granmo16 fatcat:3ep5f5abnnho7awhdrgcchfjou

Corralling a Band of Bandit Algorithms [article]

Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire
2017 arXiv   pre-print
We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the  ...  The first is to create an algorithm that enjoys worst-case robustness while at the same time performing much better when the environment is relatively easy.  ...  R(T ) α Environment ILOVETOCONBANDITS O( KT ln |Θ|) 1/2 stochastic contextual bandit BISTRO+ O((KT ) 2 3 (ln |Θ|) 1 3 ) 1/3 hybrid contextual bandit Epoch-Greedy O(T 2 3 K ln |Θ|) 1/3 stochastic contextual  ... 
arXiv:1612.06246v3 fatcat:eighvfwsfzajfozv4sigtwjudq

Data Poisoning Attacks on Stochastic Bandits [article]

Fang Liu, Ness Shroff
2019 arXiv   pre-print
In this paper, we propose a framework of offline attacks on bandit algorithms and study convex optimization based attacks on several popular bandit algorithms.  ...  Stochastic multi-armed bandits form a class of online learning problems that have important applications in online recommendation systems, adaptive medical treatment, and many others.  ...  ing that there is no robust and good stochastic bandit For each reward rt returned from the bandit environment, algorithm that can survive online poisoning attacks.  ... 
arXiv:1905.06494v1 fatcat:uhmscq7d2fhmjjxse7zcm2raam

Bandit Algorithms for Precision Medicine [article]

Yangyi Lu, Ziping Xu, Ambuj Tewari
2021 arXiv   pre-print
details of bandit algorithms that have been used in mobile health.  ...  Since these reviews were published, bandit algorithms have continued to find uses in mobile health and several new topics have emerged in the research on bandit algorithms.  ...  In such scenarios, a best action that maximizes the total reward still exists, but algorithms designed for stochastic bandit environments are no longer guaranteed to work.  ... 
arXiv:2108.04782v1 fatcat:dni5wyzyerestgs3upuzz776n4

Active Reinforcement Learning: Observing Rewards at a Cost [article]

David Krueger, Jan Leike, Owain Evans, John Salvatier
2020 arXiv   pre-print
Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics.  ...  We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes, and discuss and illustrate some challenging aspects of the ARL problem.  ...  The authors acknowledge the use of the University of Oxford Advanced Research Computing (ARC) facility in carrying out this work.  ... 
arXiv:2011.06709v2 fatcat:x5sce4tcavhrjjtqjybapmkeru
« Previous Showing results 1 — 15 out of 2,068 results