664 Hits in 3.9 sec

Optimal Resource Allocation with Semi-Bandit Feedback [article]

Tor Lattimore and Koby Crammer and Csaba Szepesvári
2014 arXiv   pre-print
Allocating more resources to a given job increases the probability that it completes, but with a cut-off.  ...  We study a sequential resource allocation problem involving a fixed number of recurring jobs.  ...  jobs under optimal allocation S * optimal amount of resources assigned to overflow process A * contains the ℓ easiest jobs (sorted by ν k ) A t set of jobs with M k,t = ν k,t−1 at time-step t B t equal  ... 
arXiv:1406.3840v1 fatcat:bp5cfurj45eqti2qzigs3tz7ki

Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback [article]

Arun Verma, Manjesh K. hanawal, Arun Rajkumar, Raman Sankaran
2019 arXiv   pre-print
Exploiting these equivalences, we derive optimal algorithms for our setting using existing algorithms for MP-MABand Combinatorial Semi-Bandits.  ...  In this paper, we study censored Semi-Bandits, a novel variant of the semi-bandits problem.  ...  Thus as resources increase, we move from semi-bandit feedback to bandit feedback and hence regret increase with the resources.  ... 
arXiv:1909.01504v3 fatcat:npmoo4pmjjef7g2fyqkny7qj5y

Playing Repeated Network Interdiction Games with Semi-Bandit Feedback

Qingyu Guo, Bo An, Long Tran-Thanh
2017 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence  
, which exploits the unique semi-bandit feedback in network security domains.  ...  We prove that SBGA achieves sublinear regret against adaptive adversary, compared with both the best fixed strategy in hindsight and a near optimal adaptive strategy.  ...  Unfortunately, we only have semi-bandit feedback in repeated NIG.  ... 
doi:10.24963/ijcai.2017/515 dblp:conf/ijcai/GuoAT17 fatcat:mqgw27cttvdsngnsyzijgq2vy4

Censored Semi-Bandits for Resource Allocation [article]

Arun Verma, Manjesh K. Hanawal, Arun Rajkumar, Raman Sankaran
2021 arXiv   pre-print
We consider the problem of sequentially allocating resources in a censored semi-bandits setup, where the learner allocates resources at each step to the arms and observes loss.  ...  Exploiting these equivalences, we derive optimal algorithms for our problem setting using known algorithms for MP-MAB and Combinatorial Semi-Bandits.  ...  Thus as resources increase, we move from semi-bandit feedback to bandit feedback. Therefore, regret increases with an increase in the amount of resources.  ... 
arXiv:2104.05781v1 fatcat:pw35y6dpcfcunfn5tviqd6jvwi

Combinatorial Multi-armed Bandits for Resource Allocation [article]

Jinhang Zuo, Carlee Joe-Wong
2021 arXiv   pre-print
We prove the proposed algorithms achieve logarithmic regrets under semi-bandit feedback.  ...  We design combinatorial multi-armed bandit algorithms to solve this problem with discrete or continuous budgets.  ...  We then observe the semi-bandit feedback, which is the reward f k (a k,t , X k,t ) from each resource k, where X k,t is sampled from an unknown distribution D k .  ... 
arXiv:2105.04373v1 fatcat:lruysvy455hazdfn5drcxdkkwi

Linear Multi-Resource Allocation with Semi-Bandit Feedback

Tor Lattimore, Koby Crammer, Csaba Szepesvári
2015 Neural Information Processing Systems  
Our main contribution is the new setting and an algorithm with nearly-optimal regret analysis.  ...  We study an idealised sequential resource allocation problem. In each time step the learner chooses an allocation of several resource types between a number of tasks.  ...  Also worth mentioning is that the resource allocation problem at hand is quite different to the "linear semi-bandit" proposed and analysed by Krishnamurthy et al. [2015] where the action set is also  ... 
dblp:conf/nips/LattimoreCS15 fatcat:zmittjzsmrcbbau6vzmxwbqwx4

Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach [article]

Arun Verma, Manjesh K. Hanawal
2020 arXiv   pre-print
We model this problem setting as a bandit setting where feedback obtained in each round depends on the resource allocated to the agents.  ...  We propose algorithms for this novel setting using ideas from Multiple-Play Multi-Armed Bandits and Combinatorial Semi-Bandits.  ...  Resource allocation with semi-bandits feedback [20] , [21] , [22] study a related but less general setting where the reward is observed in each round irrespective of the amount of resource allocated  ... 
arXiv:2006.09997v1 fatcat:4mp2iaerurdgljrsvk4nm4cobu

Spectrum Bandit Optimization [article]

Marc Lelarge and Alexandre Proutiere and M. Sadegh Talebi
2015 arXiv   pre-print
the need for exploring sub-optimal allocations.  ...  When radio conditions are unknown a priori, we look for a sequential channel allocation policy that converges to the optimal allocation while minimizing on the way the throughput loss or regret due to  ...  Most of the results presented here concern scenarios where semi-bandit feedback is available.  ... 
arXiv:1302.6974v4 fatcat:63kgffdo4fhv3bglnreioyllxy

Experimental Design for Regret Minimization in Linear Bandits [article]

Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson
2021 arXiv   pre-print
We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.  ...  In addition, we show that with slight modification our algorithm can be used for pure exploration, obtaining state-of-the-art pure exploration guarantees in the semi-bandit setting.  ...  Then there exists a O(m)-dimensional combinatorial bandit problem with semi-bandit feedback where: Figure 1 : 1 Resource allocation example with d = 5.  ... 
arXiv:2011.00576v2 fatcat:zi3cq3gghbegpmqavzik5outzi

Big-Data Streaming Applications Scheduling Based on Staged Multi-armed Bandits

Karim Kanoun, Cem Tekin, David Atienza, Mihaela van der Schaar
2016 IEEE transactions on computers  
multiple streams on many core systems with resource constraints.  ...  The proposed scheduler, applied on a face detection streaming application and without using any offline information, is able to achieve similar performance compared to an optimal semi-online solution that  ...  ACKNOWLEDGMENTS This work has been partially supported by the YINS RTD project (no. 20NA21 150939), funded by with Swiss Confederation Financing and scientifically evaluated by SNSF, and the  ... 
doi:10.1109/tc.2016.2550454 fatcat:il4m3ajp5zdgzjptbh37s33ytq

Individually Fair Learning with One-Sided Feedback [article]

Yahav Bechavod, Aaron Roth
2022 arXiv   pre-print
We then construct an efficient reduction from our problem of online learning with one-sided feedback and a panel reporting fairness violations to the contextual combinatorial semi-bandit problem (Cesa-Bianchi  ...  Finally, we show how to leverage the guarantees of two algorithms in the contextual combinatorial semi-bandit setting: Exp2 (Bubeck et al., 2012) and the oracle-efficient Context-Semi-Bandit-FTPL (Syrgkanis  ...  Ensign et al. (2018) and Elzayn et al. (2019) focus on the tasks of predictive policing and related resource allocation problems, and give algorithms for these tasks under a censored feedback model.  ... 
arXiv:2206.04475v1 fatcat:ufxk64hdlbhefdwt4svftnwufu

Extreme bandits

Alexandra Carpentier, Michal Valko
2014 Neural Information Processing Systems  
In this paper, we study an efficient way to allocate these resources sequentially under limited feedback.  ...  While sequential design of experiments is well studied in bandit theory, the most commonly optimized property is the regret with respect to the maximum mean reward.  ...  The main objective of our work is the active allocation of the sampling resources for anomaly detection, in the setting where anomalies are defined as extreme values.  ... 
dblp:conf/nips/CarpentierV14 fatcat:cbzc427tsfdbtf35dpkwsj4rnm

Path Planning Problems with Side Observations—When Colonels Play Hide-and-Seek

Dong Quan Vu, Patrick Loiseau, Alonso Silva, Long Tran-Thanh
the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semi-bandit feedback with side-observations (i.e., she observes the losses on the chosen edges plus  ...  Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions.  ...  Henceforth, we will use the term SOPPP to refer to this PPP under semi-bandit feedback with side-observations.  ... 
doi:10.1609/aaai.v34i02.5602 fatcat:2coimqj6szbn3feoubf7cqbhwe

Stochastic Bandits for Multi-platform Budget Optimization in Online Advertising [article]

Vashist Avadhanula, Riccardo Colini-Baldeschi, Stefano Leonardi, Karthik Abinav Sankararaman, Okke Schrijvers
2021 arXiv   pre-print
We model this challenging practical application as a Stochastic Bandits with Knapsacks problem over T rounds of bidding with the set of arms given by the set of distinct bidding m-tuples, where m is the  ...  Namely, for discrete bid spaces we give an algorithm with regret O(OPT √(mn/B)+ √(mn OPT)), where OPT is the performance of the optimal algorithm that knows the distributions.  ...  Adversarial bandits with knapsack are also extended in [25] to the combinatorial semi-bandit, contextual, and convex optimization settings.  ... 
arXiv:2103.10246v2 fatcat:k7mhz4fk7rdcvktawjdgqt6uxm

Adversarial Bandits with Knapsacks [article]

Nicole Immorlica and Karthik Abinav Sankararaman and Robert Schapire and Aleksandrs Slivkins
2022 arXiv   pre-print
We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints.  ...  In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack.  ...  Bandit Convex Optimization with Knapsacks We consider Bandit Convex Optimization with Knapsacks (BCOwK), a common generalization of BwK and bandit convex optimization.  ... 
arXiv:1811.11881v9 fatcat:kzjxy26xangg3pd53z3bun2bu4
« Previous Showing results 1 — 15 out of 664 results