90 Hits in 6.3 sec

Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards [article]

Aadirupa Saha, Pierre Gaillard, Michal Valko
2020 arXiv   pre-print
In this paper, we consider the problem of sleeping bandits with stochastic action sets and adversarial rewards.  ...  In this setting, in contrast to most work in bandits, the actions may not be available at all times. For instance, some products might be out of stock in item recommendation.  ...  We also wish to thank Antoine Chambaz and Marie-Hélène Gbaguidi for supporting Aadirupa's internship at Inria, and Rianne de Heide for the thorough proofreading.  ... 
arXiv:2004.06248v2 fatcat:rk45lxboszbd7nswyqf3ezrrmq

Sleeping Experts and Bandits with Stochastic Action Availability and Adversarial Rewards

Varun Kanade, H. Brendan McMahan, Brent Bryan
2009 Journal of machine learning research  
We consider the problem of selecting actions in order to maximize rewards chosen by an adversary, where the set of actions available on any given round is selected stochastically.  ...  For the bandit setting (where the algorithm only observes the reward of the action selected), we present a no-regret algorithm based on follow-theperturbed-leader.  ...  This approach is not applicable in the sleeping experts/bandit setting, since some actions may be sleeping throughout the exploration phase.  ... 
dblp:journals/jmlr/KanadeMB09 fatcat:sut66bdyirff5bwwseqijesb4q

Regret bounds for sleeping experts and bandits

Robert Kleinberg, Alexandru Niculescu-Mizil, Yogeshwer Sharma
2010 Machine Learning  
We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adversarial rewards models.  ...  We study on-line decision problems where the set of actions that are available to the decision algorithm varies over time.  ...  They were great help in improving the presentation and readability of the paper.  ... 
doi:10.1007/s10994-010-5178-7 fatcat:c2hd2hjnrzanvauaquum4avqhe

Multi-armed bandits with application to 5G small cells

Setareh Maghsudi, Ekram Hossain
2016 IEEE wireless communications  
with uncertainty and lack of information, and iii) can cope with users' selfishness.  ...  In particular, we provide a brief tutorial on bandit problems, including different variations and solution approaches.  ...  Bandits: Sleeping bandits refers to bandit problems where action set is time-varying.  ... 
doi:10.1109/mwc.2016.7498076 fatcat:uo5wmqp4i5ehlnxzk3v5yrk32e

Contextual Bandits with Cross-learning [article]

Santiago Balseiro, Negin Golrezaei, Mohammad Mahdian, Vahab Mirrokni, Jon Schneider
2021 arXiv   pre-print
In the classical contextual bandits problem, in each round t, a learner observes some context c, chooses some action i to perform, and receives some reward r_i,t(c).  ...  We design and analyze new algorithms for the contextual bandits problem with cross-learning and show that their regret has better dependence on the number of contexts.  ...  Regret Lower Bound with Stochastic Rewards and Adversarial Contexts We now present our second lower bound for the setting with stochastic rewards and adversarial contexts.  ... 
arXiv:1809.09582v3 fatcat:retak6gp3jdulla535u2zdmt6u

Adaptive Bandits: Towards the best history-dependent strategy

Odalric-Ambrym Maillard, Rémi Munos
2011 Journal of machine learning research  
This allows to model opponents (case 1) or strategies (case 2) which handles finite memory, periodicity, standard stochastic bandits and other situations.  ...  , i.e. he provides rewards that are stochastic functions of equivalence classes defined by some model θ * ∈ Θ.  ...  to their usual definitions in stochastic and adversarial bandits, respectively.  ... 
dblp:journals/jmlr/MaillardM11 fatcat:qpote7g7bbd2xpodno7yegmg7i

Bandit Algorithms for Precision Medicine [article]

Yangyi Lu, Ziping Xu, Ambuj Tewari
2021 arXiv   pre-print
With their roots in the seminal work of Bellman, Robbins, Lai and others, bandit algorithms have come to occupy a central place in modern data science ( Lattimore and Szepesvari, 2020).  ...  Since these reviews were published, bandit algorithms have continued to find uses in mobile health and several new topics have emerged in the research on bandit algorithms.  ...  We focus on the two key settings: stochastic bandit and adversarial bandit.  ... 
arXiv:2108.04782v1 fatcat:dni5wyzyerestgs3upuzz776n4

A Tutorial on Bandit Learning and Its Applications in 5G Mobile Edge Computing (Invited Paper)

Sige Liu, Peng Cheng, Zhuo Chen, Branka Vucetic, Yonghui Li
2022 Frontiers in Signal Processing  
In this paper, to deal with the above issues, we introduce bandit learning (BL), which enables each agent (MU/server) to make a sequential selection from a set of arms (servers/MUs) and then receive some  ...  numerical rewards.  ...  Non-Stochastic (Adversarial) Bandit Learning For non-stochastic (adversarial) BL, the reward generation of each arm does not have any specific probability distribution.  ... 
doi:10.3389/frsip.2022.864392 fatcat:njfrjplwcnh2phkly5dlwkefom

Survey on Fair Reinforcement Learning: Theory and Practice [article]

Pratik Gajane, Akrati Saxena, Maryam Tavakol, George Fletcher, Mykola Pechenizkiy
2022 arXiv   pre-print
Most of the research in fairness-aware learning employs the setting of fair-supervised learning.  ...  Our work is beneficial for both researchers and practitioners as we discuss articles providing mathematical guarantees as well as articles with empirical studies on real-world problems.  ...  In the adversarial setting, the rewards are generated by an adversary, and the reward probabilities may not be stationary.  ... 
arXiv:2205.10032v1 fatcat:rrc7a5aumnbe3dmptkeh5ohapa

Exploration and exploitation of scratch games

Raphaël Féraud, Tanguy Urvoy
2013 Machine Learning  
By adapting the standard multi-armed bandit algorithms to take advantage of this setting, we propose three new algorithms: the first one is designed for adversarial rewards; the second one assumes a stochastic  ...  For the adversarial and stochastic approaches, we provide upper bounds of the regret which compare favorably with the ones of EXP3 and UCB1.  ...  Acknowledgements We would like to thank anonymous reviewers and our colleagues Vincent Lemaire and Dominique Gay for their comments, which were helpful to improve the quality of this paper.  ... 
doi:10.1007/s10994-013-5359-2 fatcat:genww4scnbdhjaiixikc3c3wpa

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Sébastien Bubeck
2012 Foundations and Trends® in Machine Learning  
Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.  ...  The third fundamental model of multi-armed bandits assumes that the reward processes are neither i.i.d. (like in stochastic bandits) nor adversarial.  ...  Acknowledgments We would like to thank Mike Jordan for proposing to write this monograph and James Finlay for keeping us on track. The table of contents was laid down with the help of Gábor Lugosi.  ... 
doi:10.1561/2200000024 fatcat:fzpfffppvrfrle6vkj7z6wzh2e

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems [article]

Sébastien Bubeck, Nicolò Cesa-Bianchi
2012 arXiv   pre-print
Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.  ...  In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs.  ...  Acknowledgements We would like to thank Mike Jordan for proposing to write this survey and James Finlay for keeping us on track. The table of contents was laid down with the help of Gábor Lugosi.  ... 
arXiv:1204.5721v2 fatcat:kpclt3fswzewtcsjp7hkjncd6q

Dueling Bandits with Adversarial Sleeping [article]

Aadirupa Saha, Pierre Gaillard
2021 arXiv   pre-print
We introduce the problem of sleeping dueling bandits with stochastic preferences and adversarial availabilities (DB-SPAA).  ...  This indicates that the sleeping problem with preference feedback is inherently more difficult than that for classical multi-armed bandits (MAB).  ...  Besides the reward model, the set of available actions could also vary stochastically or adversarially [17, 24] .  ... 
arXiv:2107.02274v1 fatcat:l4lqydiz6vff5ovp4si76shrzy

Online combinatorial optimization with stochastic decision sets and adversarial losses

Gergely Neu, Michal Valko
2014 Neural Information Processing Systems  
A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability  ...  In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions.  ...  [13] , who show that the sleeping experts problem with full information and stochastic availability is no more difficult than the standard experts problem.  ... 
dblp:conf/nips/NeuV14 fatcat:5cyi5c3zfvdp7lkki6ziyomiem

Dynamic Ad Allocation: Bandits with Budgets [article]

Aleksandrs Slivkins
2013 arXiv   pre-print
This model admits a natural variant of UCB1, a well-known algorithm for multi-armed bandits with stochastic rewards. We derive strong provable guarantees for this algorithm.  ...  We consider an application of multi-armed bandits to internet advertising (specifically, to dynamic ad allocation in the pay-per-click model, with uncertainty on the click probabilities).  ...  Acknowledgements The author would like to thank Ashwin Badanidiyuru, Sebastien Bubeck and Robert Kleinberg for many stimulating conversations about multi-armed bandits.  ... 
arXiv:1306.0155v1 fatcat:amdub3ramfhvtkmjdaieykr6zi
« Previous Showing results 1 — 15 out of 90 results