A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Solving Multi-Arm Bandit Using a Few Bits of Communication
[article]
2021
arXiv
pre-print
The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. ...
only a few (as low as 3) bits to be sent per iteration while preserving the same regret bound. ...
to a small constant factor, while using a few bits of communication. ...
arXiv:2111.06067v1
fatcat:4634ecqi65e2xobf3or3j676ay
User pairing using laser chaos decision maker for NOMA systems
2022
Nonlinear Theory and Its Applications IEICE
In the meantime, ultrafast methods of solving multi-armed bandit problems have been developed using chaotic laser time series. ...
In this paper, we consider the user pairing problem in Non-Orthogonal Multiple Access as a multi-armed bandit problem and propose an ultra-fast user pairing algorithm based on the laser chaos decision ...
Introduction The ultrafast decision-maker has been demonstrated to resolve multi-armed-bandit (MAB) problems at the GHz order using chaotic laser time series [1, 2] . ...
doi:10.1587/nolta.13.72
fatcat:maqdxrr2drczjk4iymfhuoliqy
Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication
[article]
2019
arXiv
pre-print
For distributed multi-armed bandits, we propose a protocol with near-optimal regret and only O(Mlog(MK)) communication cost, where K is the number of arms. ...
The communication cost is independent of the time horizon T, has only logarithmic dependence on the number of arms, and matches the lower bound except for a logarithmic factor. ...
A UCB-based Protocol for Multi-armed Bandits In classic multi-armed bandit problems, upper confidence bound (UCB) algorithms are very efficient in solving regret minimization problem. ...
arXiv:1904.06309v2
fatcat:mukrdozn5bddnpkd5a3lkrbq5y
Network of Bandits insure Privacy of end-users
[article]
2017
arXiv
pre-print
We provide a first algorithm, Distributed Median Elimination, which is optimal in term of number of transmitted bits and near optimal in term of speed-up factor with respect to an optimal algorithm run ...
In order to distribute the best arm identification task as close as possible to the user's devices, on the edge of the Radio Access Network, we propose a new problem setting, where distributed players ...
Recent years have seen an increasing interest for the study of the collaborative distribution scheme: N players collaborate to solve a multi-armed bandit problem. ...
arXiv:1602.03779v14
fatcat:4b57prgxnfgmlgiegoytikcz3u
Meta-learning of Exploration/Exploitation Strategies: The Multi-armed Bandit Case
[chapter]
2013
Communications in Computer and Information Science
The exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. ...
a large hypothesis space of candidate E/E strategies; and (iii), solve an optimization problem to find a candidate E/E strategy of maximal average performance over a sample of problems drawn from the ...
Having a linear dependency between n p and d is a classical choice when using EDAs [14] . Note that, in most cases the optimization is solved in a few or a few tens iterations. ...
doi:10.1007/978-3-642-36907-0_7
fatcat:helibxckl5da3gw7m43k5mlck4
Faster Activity and Data Detection in Massive Random Access: A Multi-armed Bandit Approach
[article]
2020
arXiv
pre-print
To further improve the convergence rate, an inner multi-armed bandit problem is established to learn the exploration policy of Bernoulli sampling. ...
In this paper, we develop multi-armed bandit approaches for more efficient detection via coordinate descent, which make a delicate trade-off between exploration and exploitation in coordinate selection ...
Due to the sporadic communications, only a few devices are active out of all devices at a given time instant [22] . ...
arXiv:2001.10237v1
fatcat:3joodcctcbajhhb3nw2yuk3oou
A program for sequential allocation of three Bernoulli populations
1999
Computational Statistics & Data Analysis
As an illustration, the program is used to create an adaptive sampling procedure that is the optimal solution to a 3-arm bandit problem. ...
Extensions enabling the program to solve a variety of related problems are discussed. ...
Computational support was provided by the Center for Parallel Computing at the University of Michigan. We are grateful for the comments of three referees who reviewed this paper. ...
doi:10.1016/s0167-9473(99)00039-0
fatcat:x67x6276xzfutf2vdan4w53lne
Adaptive decision making using a chaotic semiconductor laser for multi-armed bandit problem with time-varying hit probabilities
2022
Nonlinear Theory and Its Applications IEICE
We numerically demonstrate the principle of adaptive decision making for solving multi-armed bandit problems in dynamically changing reward environments. ...
We use the tugof-war method by comparing a threshold and a chaotic temporal waveform generated from a semiconductor laser observed in an experiment. ...
Acknowledgments We acknowledge the support of the Japan Society for the Promotion of Science (JP19H00868, JP20K15185, and JP20H00233), JST CREST (JPMJCR17N2), and the Telecommunications Advancement Foundation ...
doi:10.1587/nolta.13.112
fatcat:hqqv3jiiyjbnbp2wq6vy3mbnru
Multi-Agent Multi-Armed Bandits with Limited Communication
[article]
2021
arXiv
pre-print
We consider the problem where N agents collaboratively interact with an instance of a stochastic K arm bandit problem for K ≫ N. ...
The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of T time steps, the number of communication rounds, and the number of bits in each communication round. ...
INTRODUCTION We consider a setup where N agents connected over a network, interact with a multi armed bandit (MAB) environment (Lattimore and Szepesvári, 2020) . ...
arXiv:2102.08462v1
fatcat:nocqx3l7fnes5hrytmutbmxjqm
Study of Multi-Armed Bandits for Energy Conservation in Cognitive Radio Sensor Networks
2015
Sensors
In order to achieve this goal, the paper introduces the concept of a bounded MAB to find the optimal packet size to transfer by formulating different packet sizes for different arms under the channel condition ...
Technological advances have led to the emergence of wireless sensor nodes in wireless networks. Sensor nodes are usually battery powered and hence have strict energy constraints. ...
Conflicts of Interest The authors declare no conflict of interest. ...
doi:10.3390/s150409360
pmid:25905702
pmcid:PMC4431283
fatcat:td7lkevqhratxgm5c6wbuoaatm
Stochastic Contextual Bandits with Known Reward Functions
[article]
2016
arXiv
pre-print
Many sequential decision-making problems in communication networks can be modeled as contextual bandit problems, which are natural extensions of the well-known multi-armed bandit problem. ...
of arms. ...
When the number of bits to be sent at each time are finite, this represents the case of discrete contextual bandits considered in this paper. 2) Energy Harvesting Communications: Consider a power-aware ...
arXiv:1605.00176v2
fatcat:hefngf7i6ffddoopp27wzrlegi
Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory
[article]
2019
arXiv
pre-print
Designing an efficient regret minimisation algorithm that uses a constant number of words has long been interesting to the community. ...
In this paper, we propose a constant word (RAM model) algorithm for regret minimisation for both finite and infinite Stochastic Multi-Armed Bandit (MAB) instances. ...
We find that providing a lower bound on the cumulative regret under the bounded arm memory constraint is an interesting question, and we leave that for future investigation. ...
arXiv:1901.08387v1
fatcat:7a6fflsa4fgl3iqdsbxsy4wzxq
Optimal Transmission Rate Control Policies in a Wireless Link Under Partial State Information
2010
IEEE Transactions on Automatic Control
The policy admits a simple interpretation: increase rate when the number of successive ACKs exceeds a threshold, and decrease rate when the number of successive NACKs exceeds a threshold. ...
We consider the problem of PHY layer transmission rate control for maximum throughput on a wireless link over a finite time horizon. ...
, or a degenerate case of a multi-armed bandit problem. ...
doi:10.1109/tac.2009.2033839
fatcat:2gsxwt5nnff3phwwbh3xilo6ha
Remote Contextual Bandits
[article]
2022
arXiv
pre-print
We consider a remote contextual multi-armed bandit (CMAB) problem, in which the decision-maker observes the context and the reward, but must communicate the actions to be taken by the agents over a rate-limited ...
In this remote CMAB (R-CMAB) problem, the constraint on the communication rate between the decision-maker and the agents imposes a trade-off between the number of bits sent per agent and the acquired average ...
This formulation is different from the existing results in the literature involving multi-agent multi-armed bandit (MAB). In [16] , each agent can pull an arm and communicate with others. ...
arXiv:2202.05182v1
fatcat:efdjgux6xfbxbmvoronut2scie
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
[article]
2022
arXiv
pre-print
In summary, we believe our reduction idea will find a broader scope in solving a diverse class of dueling bandits setting, which are otherwise studied separately from multi-armed bandits with often more ...
We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits and despite the simplicity, it allows us to improve many existing results in dueling bandits. ...
Acknowledgment Thanks to Julian Zimmert and Karan Singh for the useful discussions on the existing best-of-both-world multiarmed bandits results. ...
arXiv:2202.06694v1
fatcat:hd2j4clntzafzhcdjsndsek3gq
« Previous
Showing results 1 — 15 out of 1,457 results