1,964 Hits in 6.6 sec

An empirical evaluation of active inference in multi-armed bandits [article]

Dimitrije Markovic, Hrvoje Stojic, Sarah Schwoebel, Stefan J. Kiebel
2021 arXiv   pre-print
Our empirical evaluation shows that the active inference algorithm does not produce efficient long-term behaviour in stationary bandits.  ...  The multi-armed bandit problem, a classical task that captures this trade-off, served as a vehicle in machine learning for developing bandit algorithms that proved to be useful in numerous industrial applications  ...  An important next step in examining active inference in the context of multi-armed bandits is to establish theoretical bounds on the cumulative regret for the stationary bandit problem.  ... 
arXiv:2101.08699v4 fatcat:swdx2m5l3zfdxgbkczjejbpa7e

Multi-Fingered Active Grasp Learning [article]

Qingkai Lu, Mark Van der Merwe, Tucker Hermans
2020 arXiv   pre-print
We embed this within a multi-armed bandit formulation of sample selection.  ...  We base our approach on recent success in planning multi-fingered grasps as probabilistic inference with a learned neural network likelihood function.  ...  We empirically add a constant of 0.35, −0.05, and 0.6 to the max success, max uncertainty, and exploration arm rewards, respectively, for the multi-armed bandit active learning.  ... 
arXiv:2006.05264v2 fatcat:wa4a2qmxyjhj7budqwxbyl3qbq

Variational inference for the multi-armed contextual bandit [article]

Iñigo Urteaga, Chris H. Wiggins
2021 arXiv   pre-print
One general class of algorithms for optimizing interactions with the world, while simultaneously learning how the world operates, is the multi-armed bandit setting and, in particular, the contextual bandit  ...  We consider contextual multi-armed bandit applications where the true reward distribution is unknown and complex, which we approximate with a mixture model whose parameters are inferred via variational  ...  Acknowledgments This research was supported in part by NSF grant SCH-1344668. We thank Shipra Agrawal, David Blei and Daniel J. Hsu for discussions that helped motivate this work.  ... 
arXiv:1709.03163v3 fatcat:uvdodgigxfb6vnep5f5euv2no4

Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit

Chunqiu Zeng, Qing Wang, Shekoofeh Mokhtari, Tao Li
2016 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16  
Contextual multi-armed bandit problems have gained increasing popularity and attention in recent years due to their capability of leveraging contextual information to deliver online personalized recommendation  ...  To predict the reward of each arm given a particular context, existing relevant research studies for contextual multi-armed bandit problems often assume the existence of a fixed yet unknown reward mapping  ...  Contextual multi-armed bandit problem is an instance of bandit problem, where the contextual information is utilized for arm selection.  ... 
doi:10.1145/2939672.2939878 dblp:conf/kdd/ZengWML16 fatcat:r2r3c54nbjeqtlbajymtkpnb6y

Handling Advertisements of Unknown Quality in Search Advertising

Sandeep Pandey, Christopher Olston
2006 Neural Information Processing Systems  
In this paper we study the tradeoff between exploration and exploitation, modeling advertisement placement as a multi-armed bandit problem.  ...  We extend traditional bandit formulations to account for budget constraints that occur in search engine advertising markets, and derive theoretical bounds on the performance of a family of algorithms.  ...  We evaluate our policies empirically over real-world data in Section 5.  ... 
dblp:conf/nips/PandeyO06 fatcat:74bg4hx2xbearpuol2iltprmzy

Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback [article]

Alexandre Letard, Tassadit Amghar, Olivier Camp, Nicolas Gutowski
2020 arXiv   pre-print
Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed Bandits (COM-MAB) show good results on a global accuracy metric.  ...  Herein, we propose a novel approach reducing the number of explicit feedbacks required by Combinatorial Multi Armed bandit (COM-MAB) algorithms while providing similar levels of global accuracy and learning  ...  Acknowledgments This work has been carried out by the KARA TECHNOLOGY company in partnership with LERIA laboratory (University of Angers, France) and ESEO-TECH (Angers, France) and, with the support of  ... 
arXiv:2009.07518v1 fatcat:hknfsllrczeylhsocryewgjsgi

Adapting multi-armed bandits policies to contextual bandits scenarios [article]

David Cortes
2019 arXiv   pre-print
This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression  ...  In particular, the Adaptive-Greedy algorithm shows a lot of promise, in many cases achieving better performance than upper confidence bound and Thompson sampling strategies, at the expense of more hyperparameters  ...  An empirical evaluation of adapted multi-armed bandits policies was performed, which in many cases showed better results compared to simpler baselines or to discarding the context.  ... 
arXiv:1811.04383v2 fatcat:abf6feswprgxrnmrbzcwou6yby

Adaptive demand response: Online learning of restless and controlled bandits

Qingsi Wang, Mingyan Liu, Johanna L. Mathieu
2014 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm)  
We formulate this problem as a multi-armed restless bandit problem with controlled bandits.  ...  Our problem has two features not commonly addressed in the bandit literature: the arms/processes evolve according to different probabilistic laws depending on the control, and the reward/feedback observed  ...  To put in the context of the bandit problem framework, a load maps to an arm, and the deployment of a load maps to the activation or playing of an arm.  ... 
doi:10.1109/smartgridcomm.2014.7007738 dblp:conf/smartgridcomm/WangLM14 fatcat:ddsolmx3lndn7fzwv5gwda3hry

Bandit Label Inference for Weakly Supervised Learning [article]

Ke Li, Jitendra Malik
2015 arXiv   pre-print
The scarcity of data annotated at the desired level of granularity is a recurring issue in many applications.  ...  of weak supervision regimes, form of available data and prior knowledge of the task at hand.  ...  Labels are inferred in an efficient manner using a combinatorial multi-armed bandit algorithm; for this reason we dub the proposed method Bandit Label Inference as Supervisory Signal, or BLISS for short  ... 
arXiv:1509.06807v1 fatcat:3ba77wwpyjdwzjhfwhdmuxdhre

Multi-facet Contextual Bandits: A Neural Network Perspective [article]

Yikun Ban, Jingrui He, Curtiss B. Cook
2021 arXiv   pre-print
Contextual multi-armed bandit has shown to be an effective tool in recommender systems.  ...  In this paper, we study a novel problem of multi-facet bandits involving a group of bandits, each characterizing the users' needs from one unique aspect.  ...  EXPERIMENTS To evaluate the empirical performance of MuFasa, in this section, we design two different multi-facet bandit problems on three realworld data sets.  ... 
arXiv:2106.03039v3 fatcat:5junsq5mgrb2vpyd5iqwe27loa

Multi-armed Bandit Problem with Known Trend [article]

Djallel Bouneffouf, Raphaël Feraud
2017 arXiv   pre-print
We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution  ...  By adapting the standard multi-armed bandit algorithm UCB1 to take advantage of this setting, we propose the new algorithm named A-UCB that assumes a stochastic model.  ...  We provided an extension that allows UCB algorithm to be used in the case of MAB problem with known trend. Further, we provided an upper bound of regret of the proposed algorithm.  ... 
arXiv:1508.07091v4 fatcat:zqfpwpls35bz3azu3ngwxd4yfu

Failure is Not an Option: Policy Learning for Adaptive Recovery in Space Operations

Steve McGuire, P. Michael Furlong, Christoffer Heckman, Simon Julier, Daniel Szafir, Nisar Ahmed
2018 IEEE Robotics and Automation Letters  
This paper considers the problem of how robots in long-term space operations can learn to choose appropriate sources of assistance to recover from failures.  ...  Proof of concept simulations of long-term human-robot interactions for space exploration are used to compare the performance of the contextual bandit against other state of the art assistant selection  ...  MULTI-ARM BANDIT ASSISTANCE ALLOCATION The optimal selection of an appropriate actor on the basis of dynamically changing information can be viewed as an instance of a contextual multi-arm bandit problem  ... 
doi:10.1109/lra.2018.2801468 dblp:journals/ral/McGuireFHJSA18 fatcat:jdesdvmppfg5vpfrftgm6x26bq

Interactive Social Recommendation

Xin Wang, Steven C.H. Hoi, Chenghao Liu, Martin Ester
2017 Proceedings of the 2017 ACM on Conference on Information and Knowledge Management - CIKM '17  
In the real world, new users may leave the systems for the reason of being recommended with boring items before enough data is collected for training a good model, which results in an ine cient customer  ...  Social recommendation has been an active research topic over the last decade, based on the assumption that social information from friendship networks is bene cial for improving recommendation accuracy  ...  In this section, we will give a mathematical description of the general idea for multi-armed bandit (MAB) strategy in the context of recommender systems, as well as several existing multi-armed bandit  ... 
doi:10.1145/3132847.3132880 dblp:conf/cikm/WangHLE17 fatcat:l4xwvhl67nhs7djignsv5obrne

Best-Arm Identification in Correlated Multi-Armed Bandits

Samarth Gupta, Gauri Joshi, Osman Yagan
2021 IEEE Journal on Selected Areas in Information Theory  
In this paper we consider the problem of best-arm identification in multi-armed bandits in the fixed confidence setting, where the goal is to identify, with probability 1-δ for some δ>0, the arm with the  ...  We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms in the form of upper bounds on expected conditional reward of an arm, given a reward realization  ...  In [29] , authors evaluate a lower bound for the Multi-Armed bandit problem in the form of an optimization problem.  ... 
doi:10.1109/jsait.2021.3082028 fatcat:d4ql6qrchjhdzhzsqp4wskt65y

Practical Bayesian Learning of Neural Networks via Adaptive Optimisation Methods [article]

Samuel Kessler, Arnold Salas, Vincent W. C. Tan, Stefan Zohren, Stephen Roberts
2020 arXiv   pre-print
We also demonstrate the quality of the derived uncertainty measures by comparing the performance of Badam to standard methods in a Thompson sampling setting for multi-armed bandits, where good uncertainty  ...  measures are required for an agent to balance exploration and exploitation.  ...  A detailed description of the datasets used for our multi-armed bandit experiments can also be found in the appendix of (Riquelme et al., 2018) .  ... 
arXiv:1811.03679v3 fatcat:f4kjgcns4jghxicddjtggxcbx4
« Previous Showing results 1 — 15 out of 1,964 results