Filters








3,441 Hits in 3.3 sec

Towards Fundamental Limits of Multi-armed Bandits with Random Walk Feedback [article]

Tianyu Wang, Lin F. Yang, Zizhuo Wang
2022 arXiv   pre-print
In this paper, we consider a new Multi-Armed Bandit (MAB) problem where arms are nodes in an unknown and possibly changing graph, and the agent (i) initiates random walks over the graph by pulling arms  ...  , (ii) observes the random walk trajectories, and (iii) receives rewards equal to the lengths of the walks.  ...  In this section, we study the behavior of these two algorithms on bandit problems with random walk feedback.  ... 
arXiv:2011.01445v7 fatcat:fymxochsuvgrpci2ueg7mutxou

The K-armed dueling bandits problem

Yisong Yue, Josef Broder, Robert Kleinberg, Thorsten Joachims
2012 Journal of computer and system sciences (Print)  
We study a partial-information online-learning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits).  ...  Analyzing the Random Walk Model Let b (r) andb (r) denote the candidate bandits in round r of IF and the Random Walk Model described in Definition 1, respectively.  ...  We will prove that the random walk in the Random Walk Model requires O(log K ) steps with high probability.  ... 
doi:10.1016/j.jcss.2011.12.028 fatcat:js6gr4zh6vamfcpu3o4ljl2bqy

Walk for Learning: A Random Walk Approach for Federated Learning from Heterogeneous Data [article]

Ghadir Ayache, Venkat Dassari, Salim El Rouayheb
2022 arXiv   pre-print
In this work, we study random walk (RW) learning algorithms for tackling the communication and data heterogeneity problems.  ...  Our numerical results validate our theoretical findings and show that our algorithm outperforms existing random walk algorithms.  ...  First, we present the details of our Sleeping Multi-Armed Bandit Random Walk SGD algorithm in Algorithm 1.  ... 
arXiv:2206.00737v1 fatcat:c4rsslxsxbdarewucuhasr5b7u

Adaptive crowdsourcing for temporal crowds

L. Elisa Celis, Koustuv Dasgupta, Vaibhav Rajan
2013 Proceedings of the 22nd International Conference on World Wide Web - WWW '13 Companion  
Bandit problems embody in essential form a conflict evident in all human action: information versus immediate payoff. -P.  ...  In this paper, we address the changing crowds problem and, specifically, propose a multi-armed bandit based framework. We introduce the simple ε-smart algorithm that performs robustly.  ...  walks or Brownian motion, and hence here we use a random walk model.  ... 
doi:10.1145/2487788.2488125 dblp:conf/www/CelisDR13 fatcat:ry25snueyffitbx3wvgpverhee

Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback [article]

Zongqi Wan, Xiaoming Sun, Jialin Zhang
2022 arXiv   pre-print
Especially, for K-armed bandit and bandit convex optimization, we have 𝒪(T^2/3) policy regret bound. We also prove a matching lower bound for K-armed bandit.  ...  We study the adversarial bandit problem with composite anonymous delayed feedback.  ...  To address this problem, we borrow the idea of multi-scale random walk from [DDKP14] . Multi-scale random walk is a trade-off between random walk and i.i.d samples.  ... 
arXiv:2204.12764v2 fatcat:uwumzddq2ra7zclmady4wx2goe

Crawling the Community Structure of Multiplex Networks

Ricky Laishram, Jeremy D. Wendt, Sucheta Soundarajan
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
MCS uses multiple levels of multi-armed bandits to determine the best layers, communities and node roles for selecting nodes to query.  ...  The running time for random walk (green) is also added as a reference.  ...  Figure 3 shows the number of edges found against time for MCS and random walk on the Twitter API. We observe that MCS scales very well with network size, and overtakes the random walk.  ... 
doi:10.1609/aaai.v33i01.3301168 fatcat:tty2wsjpx5eybizcarx3pn7tea

Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit

Chunqiu Zeng, Qing Wang, Shekoofeh Mokhtari, Tao Li
2016 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16  
In the proposed model, the drift on the reward mapping function is explicitly modeled as a set of random walk particles, where good fitted particles are selected to learn the mapping dynamically.  ...  Contextual multi-armed bandit problems have gained increasing popularity and attention in recent years due to their capability of leveraging contextual information to deliver online personalized recommendation  ...  Since state η k,t−1 changes over time with a standard Gaussian random walk, it follows a Gaussian distribution after accumulating t − 1 standard Gaussian random walks.  ... 
doi:10.1145/2939672.2939878 dblp:conf/kdd/ZengWML16 fatcat:r2r3c54nbjeqtlbajymtkpnb6y

Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax [chapter]

Michel Tokic, Günther Palm
2011 Lecture Notes in Computer Science  
π(s) = random action from A(s) if ξ < ε argmax a∈A(s) Q(s, a) otherwise, (4) where 0 ≤ ξ ≤ 1 is a uniform random number drawn at each time step.  ...  bandit.  ... 
doi:10.1007/978-3-642-24455-1_33 fatcat:eod4k25jnrdhrfix4ea67krjba

The Use of Bandit Algorithms in Intelligent Interactive Recommender Systems [article]

Qing Wang
2021 arXiv   pre-print
However, few existing bandit models are able to adapt to new changes introduced by the modern recommender systems.  ...  Multi-armed bandit algorithms, which have been widely applied into various online systems, are quite capable of delivering such efficient recommendation services.  ...  Since state η k,t−1 changes over time with a standard Gaussian random walk, it follows a Gaussian distribution after accumulating t − 1 standard Gaussian random walks.  ... 
arXiv:2107.00161v1 fatcat:dv2s3ezfmjazxpeekhskalwh5u

Attenuated directed exploration during reinforcement learning in gambling disorder

A. Wiehler, K. Chakroun, J. Peters
2021 Journal of Neuroscience  
changes in brain activity in a fronto-parietal exploration-related network.Twenty-three frequent, non-treatment seeking gamblers and twenty-three healthy matched controls (all male) performed a four-armed bandit  ...  Computational modeling using hierarchical Bayesian parameter estimation revealed signatures of directed exploration, random exploration, and perseveration in both groups.  ...  B, Payouts fluctuated across the 300 trials of the experiment according to Gaussian random walks. Here, one example set of random walks is shown.  ... 
doi:10.1523/jneurosci.1607-20.2021 pmid:33531415 pmcid:PMC7984586 fatcat:s3syynxfsvchrfc2ef3teqhwzy

Exact simulation of diffusion first exit times: algorithm acceleration [article]

Samuel Herrmann, Cristina Zucca
2020 arXiv   pre-print
In order to describe or estimate different quantities related to a specific random variable, it is of prime interest to numerically generate such a variate.  ...  In this paper the authors highlight an acceleration procedure for the GDET-algorithm based on a multi-armed bandit model. The efficiency of this acceleration is pointed out through numerical examples.  ...  Section 2 concerns the introduction of the random walk on small rectangles of area 2T × [a, b]/N . A multi-armed bandit method is introduced in Section 3 for the optimal choice of the parameter N .  ... 
arXiv:2004.02313v1 fatcat:4d3eqzstqrhzlm74ghjof3qe64

Laplacian-regularized graph bandits: Algorithms and theoretical analysis [article]

Kaige Yang and Xiaowen Dong and Laura Toni
2020 arXiv   pre-print
We introduce a novel bandit algorithm where the smoothness prior is imposed via the random-walk graph Laplacian, which leads to a single-user cumulative regret scaling as Õ(Ψ d √(T)) with time horizon  ...  We consider a stochastic linear bandit problem with multiple users, where the relationship between users is captured by an underlying graph and user preferences are represented as smooth signals on the  ...  In this paper, we address the above limitations with the following main contributions: • We propose a bandit algorithm GraphUCB based on the random-walk graph Laplacian, and show its theoretical advantages  ... 
arXiv:1907.05632v3 fatcat:7wlowgrcgjgdffwjpv5ocntesm

Bandits with switching costs

Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres
2014 Proceedings of the 46th Annual ACM Symposium on Theory of Computing - STOC '14  
The key to all of our results is a new randomized construction of a multi-scale random walk, which is of independent interest and likely to prove useful in additional settings.  ...  We study the adversarial multi-armed bandit problem in a setting where the player incurs a unit cost each time he switches actions.  ...  This consideration rules out the simple Gaussian random walk, whose depth is T . Definition 2 (cut, width).  ... 
doi:10.1145/2591796.2591868 dblp:conf/stoc/DekelDKP14 fatcat:xkj74tb3qbfqtjtusftuuxccc4

Bandits with Switching Costs: T^2/3 Regret [article]

Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres
2013 arXiv   pre-print
The key to all of our results is a new randomized construction of a multi-scale random walk, which is of independent interest and likely to prove useful in additional settings.  ...  We study the adversarial multi-armed bandit problem in a setting where the player incurs a unit cost each time he switches actions.  ...  This consideration rules out the simple Gaussian random walk, whose depth is T . Definition 2 (cut, width).  ... 
arXiv:1310.2997v2 fatcat:ngxldnhk5fbm5pp7zjavn4ycue

Whom to Test? Active Sampling Strategies for Managing COVID-19 [article]

Yingfei Wang, Inbal Yahav, Balaji Padmanabhan
2020 arXiv   pre-print
The bandit algorithm uses contact tracing, location-based sampling and random sampling in order to select specific individuals to test.  ...  The smart-testing ideas presented here are motivated by active learning and multi-armed bandit techniques in machine learning.  ...  Then the random walk is confined on the pre-defined meta-paths.  ... 
arXiv:2012.13483v1 fatcat:57s4yemryzapnkntd2dgim44li
« Previous Showing results 1 — 15 out of 3,441 results