Filters








326 Hits in 3.3 sec

Cooperative Multi-Agent Bandits with Heavy Tails [article]

Abhimanyu Dubey, Alex Pentland
2020 arXiv   pre-print
We study the heavy-tailed stochastic bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem, while communicating on a network with delays.  ...  We propose MP-UCB, a decentralized multi-agent algorithm for the cooperative stochastic bandit that incorporates robust estimation with a message-passing protocol.  ...  It is therefore prudent to devise methods that are robust to such heavy-Next, we present an algorithm MP-UCB for the cooperative multi-agent stochastic bandit under heavy-tailed densities.  ... 
arXiv:2008.06244v1 fatcat:2cjp4v6hrnccbdjmfdrhkc2gru

Competing Bandits: The Perils of Exploration Under Competition [article]

Guy Aridor and Yishay Mansour and Aleksandrs Slivkins and Zhiwei Steven Wu
2021 arXiv   pre-print
We consider a stylized duopoly model in which two firms face the same multi-armed bandit problem.  ...  Here users play three distinct roles: they are customers that generate revenue, they are sources of data for learning, and they are self-interested agents which choose among the competing platforms.  ...  Heavy-Tail T 0 = 20 T 0 = 250 T 0 = 500 Table 19 : Duopoly Experiment: Heavy-Tail, K = 3, T = 5000. Each cell describes a game between two algorithms, call them Alg1 vs.  ... 
arXiv:2007.10144v5 fatcat:x3zbcwdbubcylfqzuxpj2xkpe4

Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions [article]

Junyan Liu, Shuai Li, Dapeng Li
2021 arXiv   pre-print
We study the problem of stochastic bandits with adversarial corruptions in the cooperative multi-agent setting, where V agents interact with a common K-armed bandit problem, and each pair of agents can  ...  communicate with each other to expedite the learning process.  ...  Cooperative multi-agent bandits with heavy tails. In ICML, volume 119, pages 2730-2739, 2020. Eyal Even-Dar, Shie Mannor, and Yishay Mansour.  ... 
arXiv:2106.04207v1 fatcat:vxv44zq4vbgwhizsktykxtm4ay

Solving Multi-Arm Bandit Using a Few Bits of Communication [article]

Osama A. Hanna, Lin F. Yang, Christina Fragouli
2021 arXiv   pre-print
The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards.  ...  In this paper we address the communication problem by optimizing the communication of rewards collected by distributed agents.  ...  (2) How to deal with heavy tailed noise? (3) How to convey contexts in the contextual bandit setting if these are not implicitly conveyed?  ... 
arXiv:2111.06067v1 fatcat:4634ecqi65e2xobf3or3j676ay

Multitask Bandit Learning Through Heterogeneous Feedback Aggregation [article]

Zhi Wang, Chicheng Zhang, Manish Kumar Singh, Laurel D. Riek, Kamalika Chaudhuri
2021 arXiv   pre-print
We formulate this problem as the ϵ-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players  ...  In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol.  ...  Dubey and Pentland (2020a) investigate multi-agent bandits with heavy-tailed rewards.  ... 
arXiv:2010.15390v2 fatcat:obhmf2l7zvcjbns6ailmpbwt7u

Understanding the Limitations of Network Online Learning [article]

Timothy LaRock, Timothy Sakharov, Sahely Bhadra, Tina Eliassi-Rad
2020 arXiv   pre-print
In order to deal with the heavy tailed nature of the target variable, we adapt the methods from [25] for regression in the presence of heavy-tailed distributions.  ...  [6] Baseline Methods We compare the performance of NOL* with 4 heuristic baseline methods and a multi-armed bandit method.  ...  : Results of running NOL* algorithms on the same networks as fig. 3 , but starting from initial samples using random walk with jump sampling rather than node sampling with induction.  ... 
arXiv:2001.07607v1 fatcat:7e2q5mowijgp7etrvt75rfbosu

Artificial intelligence based cognitive routing for cognitive radio networks

Junaid Qadir
2015 Artificial Intelligence Review  
Cognitive radio networks (CRNs) are networks of nodes equipped with cognitive radios that can optimize performance by adapting to network conditions.  ...  These results motivated the need of simple models featuring correlated ON and OFF periods with heavy-tailed marginal distributions.  ...  In fact, the distributions of the ON and in particular the OFF period were often found to be heavy-tailed [152] .  ... 
doi:10.1007/s10462-015-9438-6 fatcat:hi4mk5iaf5dsjgco5sgknejroq

Partnership as Experimentation: Business Organization and Survival in Egypt, 191001949

Cihan Artunn, Timothy W. Guinnane
2017 Social Science Research Network  
Using a multi-armed bandit model, we show that an experimentation mechanism creates a spike in dissolution rates early in firms' lives, as less productive matches break down and agents look for better  ...  We start with a model based on a multi-armed bandit framework. Agents match to one another to produce some surplus in each period.  ...  The expanded model becomes a multi-armed bandit superprocess. Each agent is not merely experimenting with each "arm" but also considers an action that affects the arm's payoffs.  ... 
doi:10.2139/ssrn.2973315 fatcat:w342ocjh4fbp3fqvucfu4smj64

Q-Learning with Basic Emotions [article]

Wilfredo Badoy Jr., Kardi Teknomo
2016 arXiv   pre-print
Simulations show that the proposed affective agent requires lesser number of steps to find the optimal path.  ...  In this paper, we propose using four basic emotions: joy, sadness, fear, and anger to influence a Qlearning agent.  ...  Ahn and Picard [19] simulated both single-step decision making with a two-armed bandit type gambling task and a maze task.  ... 
arXiv:1609.01468v1 fatcat:rng6fzzzsbgglckn3s3dnw5454

Federated Multi-Armed Bandits Under Byzantine Attacks [article]

Ilker Demirel, Yigit Yildirim, Cem Tekin
2022 arXiv   pre-print
Federated multi-armed bandits (FMAB) is a recently emerging framework where a cohort of learners with heterogeneous local models play a MAB game and communicate their aggregated feedback to a parameter  ...  Multi-armed bandits (MAB) is a simple reinforcement learning model where the learner controls the trade-off between exploration versus exploitation to maximize its cumulative reward.  ...  In this work, we study the federated multi-armed bandit (FMAB) problem introduced in [3] , where a cohort of clients play a multi-armed bandit game with heterogeneous local models to learn a global feedback  ... 
arXiv:2205.04134v1 fatcat:zcleooiygrbpri7l67uhw5avxm

A Centralized Multi-stage Non-parametric Learning Algorithm for Opportunistic Spectrum Access [article]

Thulasi Tholeti, Vishnu Raj, Sheetal Kalyani
2020 arXiv   pre-print
We provide comprehensive empirical validation of the method with other approaches.  ...  In this work, we propose a multi-stage algorithm that - 1) effectively assigns the available channel to the SUs, 2) employs a non-parametric learning framework to estimate the primary traffic distribution  ...  A distributed spectrum sensing based on reinforcement learning is proposed in [27] for multi-band sensing with multiple agents where the agents are assumed to be time-synchronized.  ... 
arXiv:1804.11135v2 fatcat:33ht4eplr5d73hmqk2ullarmui

Dynamic Spectrum Access in Time-varying Environment: Distributed Learning Beyond Expectation Optimization [article]

Yuhua Xu, Jinlong Wang, Qihui Wu, Jianchao Zheng, Liang Shen and Alagan Anpalagan
2017 arXiv   pre-print
Based on an approximated utility function, we propose a multi-agent learning algorithm which is proved to achieve stable solutions with dynamic and incomplete information constraints.  ...  Also, it is shown that the proposed multi-agent learning algorithm achieves satisfactory performance.  ...  Recently, the problem dynamic spectrum access with varying channel states began to draw attention, using e.g., Markovian decision process (MDP) [13] , online learning algorithms for multi-armed bandit  ... 
arXiv:1502.06672v4 fatcat:ehiz6u5ep5ghnlhxvs7h7imozi

(In)Stability for the Blockchain: Deleveraging Spirals and Stablecoin Attacks [article]

Ariah Klages-Mundt, Andreea Minca
2021 arXiv   pre-print
Cryptocurrency returns are well known for having very heavy tails. This choice gives us these heavy tails with finite variance.  ...  This is facilitated by blockchain, which introduced a new way for mistrusting agents to cooperate without trusted third parties.  ... 
arXiv:1906.02152v3 fatcat:z7ijvzrbfjgudmopqzc54zoglu

Deep Hierarchical Reinforcement Learning Based Recommendations via Multi-goals Abstraction [article]

Dongyang Zhao, Liang Zhang, Bo Zhang, Lizhou Zheng, Yongjun Bao, Weipeng Yan
2019 arXiv   pre-print
The high-level agent catches long-term sparse conversion signals, and automatically sets abstract goals for low-level agent, while the low-level agent follows the abstract goals and interacts with real-time  ...  ., high-level agent and low-level agent.  ...  (HRL-MG), in which the high-level agent guides the low-level agent via multi-goals abstraction.  ... 
arXiv:1903.09374v1 fatcat:pp55xrgzwravdmtcw6o45jop7q

Helicopters on the asymmetric battlefield: challenges for photonics

Johnny Heikell, David H. Titterton, Mark A. Richardson
2007 Technologies for Optical Countermeasures IV  
The problem set of battlefield helicopters and related photonics in asymmetric scenarios is addressed with emphasis on survivability and electronic warfare.  ...  The term insurgent is used here for what is also-depending on one's view-termed guerilla, freedom fighter, terrorist, irregular, patriot, bandit, etc.  ...  Standoff detection of CBR agents is then needed to get warning of, and if possible avoid, contaminated areas.  ... 
doi:10.1117/12.736632 fatcat:mtklbos525g2lbgiit62rmmnqy
« Previous Showing results 1 — 15 out of 326 results