108 Hits in 6.3 sec

Thompson Sampling on Symmetric α-Stable Bandits [article]

Abhimanyu Dubey, Alex Pentland
2019 arXiv   pre-print
In this paper, we revisit the Thompson Sampling algorithm under rewards drawn from symmetric α-stable distributions, which are a class of heavy-tailed probability distributions utilized in finance and  ...  Thompson Sampling provides an efficient technique to introduce prior knowledge in the multi-armed bandit problem, along with providing remarkable empirical performance.  ...  We also would like to acknowledge Tor Lattimore and Csaba Szepesvari's text on bandit algorithms [LS] for their excellent treatment of bandit algorithms.  ... 
arXiv:1907.03821v2 fatcat:adndtf42g5c3bndlf5vlso2tge

Thompson Sampling on Asymmetric α-Stable Bandits [article]

Zhendong Shi, Ercan E. Kuruoglu, Xiaoli Wei
2022 arXiv   pre-print
In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric α-stable distributions and explore their applications in modelling  ...  Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws.  ...  Based on the regret bound for symmetric α Thompson sampling, we develop a regret bound for asymmetric α one in the parameter, action and observation spaces.  ... 
arXiv:2203.10214v2 fatcat:vovkvfjztfdybc3jj7n4ke3dae

Test Roll: Profit-Maximizing A/B Tests [article]

Elea McDonnell Feit, Ron Berman
2019 arXiv   pre-print
In all three cases, the optimal sample sizes are substantially smaller than for a traditional hypothesis test, resulting in higher profit.  ...  The proposed test design achieves nearly the same expected regret as the flexible, yet harder-to-implement multi-armed bandit under a wide range of conditions.  ...  Agrawal and Goyal (2013) show that the expected regret of a multi-armed bandit with Thompson sampling (Thompson 1933) Test & Roll with Asymmetric Normal Priors The analysis thus far focused on cases  ... 
arXiv:1811.00457v2 fatcat:qzvxf6cm6nhdld3c6d32d7i3pq

A Reward Optimization Model for Decision-making under Budget Constraint

Chen Zhao, Bin Yang, Yu Hirate
2019 Journal of Information Processing  
Empirical evaluation suggests that an adapted version of Thompson sampling is the best suitable policy for the proposed algorithm.  ...  The proposed model addresses both problems based on a semi-parametric graphical model that approximates function outputs with limited data samples through Bayesian optimization.  ...  Instead of directly sampling ob- served CTR as r i we found that Thompson sampling delivers a much more stable impression weight list if we apply some map- ping [0, 1] → R to observations with a logit  ... 
doi:10.2197/ipsjjip.27.190 fatcat:32a75v5oxre5nfki4oqbstz5km

Diffusion Asymptotics for Sequential Experiments [article]

Stefan Wager, Kuang Xu
2021 arXiv   pre-print
As an application of this framework, we use the diffusion limit to obtain several new insights on the regret and belief evolution of Thompson sampling.  ...  We also demonstrate that, in this regime, the posterior beliefs underlying Thompson sampling are highly unstable over time.  ...  In doing so, we focus on Thompson sampling in the one-and two-armed bandit problems (i.e., with K " 1 or 2).  ... 
arXiv:2101.09855v3 fatcat:z6srvho6pfeghefolrhkriof3u

Collaborative Spatial Reuse in Wireless Networks via Selfish Multi-Armed Bandits [article]

Francesc Wilhelmi, Cristina Cano, Gergely Neu, Boris Bellalta, Anders Jonsson, Sergio Barrachina-Muñoz
2018 arXiv   pre-print
These strategies, contrary to UCB and Thompson sampling, base their operation on the absolute experienced reward, rather than on its distribution.  ...  We rely on Reinforcement Learning (RL), and more specifically on Multi-Armed Bandits (MABs), to allow networks to learn their best configuration.  ...  However, Thompson sampling is shown to be much more stable than the other mechanisms, since its variability in the aggregate throughput is much lower (depicted in Figure 4 (b)).  ... 
arXiv:1710.11403v3 fatcat:b6cpppgdnze5vf7zkzpvc4glfu

Bayesian decision-making under misspecified priors with applications to meta-learning [article]

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dudík, Robert E. Schapire
2021 arXiv   pre-print
Thompson sampling and other Bayesian sequential decision-making algorithms are among the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits.  ...  We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most 𝒪̃(H^2 ϵ) from TS with a well specified prior, where ϵ is the total-variation distance  ...  Sensitivity of Thompson Sampling and Related Bayesian Bandit Algorithms.  ... 
arXiv:2107.01509v1 fatcat:m2644v2yfng4lgeh74vwav6aq4

What You See May Not Be What You Get: UCB Bandit Algorithms Robust to ϵ-Contamination [article]

Laura Niss, Ambuj Tewari
2020 arXiv   pre-print
(EXP3++ and TsallisInf) even when our constraint on the proportion of contaminated rewards is broken.  ...  Motivated by applications of bandit algorithms in education, we consider a stochastic multi-armed bandit problem with ε-contaminated rewards.  ...  An example in education is a recent paper testing bandit Thompson sampling to identify high quality student generated solution explanations to math problems using MTurk participants (Williams et al.,  ... 
arXiv:1910.05625v3 fatcat:ir3uek5ogrbttlamahslkp266q

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems [article]

Aurélien Garivier, Gilles Stoltz
2018 arXiv   pre-print
We revisit lower bounds on the regret in the case of multi-armed bandit problems.  ...  We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences.  ...  [11] , Thompson [24] Sampling, EXP3 of Auer et al. [3] , etc.  ... 
arXiv:1602.07182v3 fatcat:eftewjll75ekzhg5hx5cjigohu

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Aurélien Garivier, Pierre Ménard, Gilles Stoltz
2019 Mathematics of Operations Research  
We revisit lower bounds on the regret in the case of multi-armed bandit problems.  ...  We obtain non-asymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergences.  ...  [11] , Thompson [24] Sampling, EXP3 of Auer et al. [3] , etc.  ... 
doi:10.1287/moor.2017.0928 fatcat:jatwhzg6kvffzmdx4sbx2atmnm

GuideBoot: Guided Bootstrap for Deep Contextual Bandits [article]

Feiyang Pan, Haoming Li, Xiang Ao, Wei Wang, Yanrong Kang, Ao Tan, Qing He
2021 arXiv   pre-print
of Thompson sampling.  ...  It still remains largely unsolved to develop a practical method for complex deep contextual bandits.  ...  Osband and Van Roy [25] proposed a bandit algorithm named BootstrapThompson and showed the algorithm approximates Thompson sampling in Bernoulli bandits. Vaswani et al.  ... 
arXiv:2107.08383v1 fatcat:b35gbanfejhexma2qylbgdftma

Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game

Ole-Christoffer Granmo, Sondre Glimsdal
2012 Applied intelligence (Boston)  
simply on updating the hyper parameters of sibling conjugate priors, and on random sampling from these posteriors.  ...  In this manner, our scheme outperforms recently proposed Goore Game solution schemes, where one has to trade off accuracy with speed. As an additional benefit, performance also becomes more stable.  ...  technique for solving bandit like problems, revisiting the Thompson Sampling [21] principle pioneered in 1933.  ... 
doi:10.1007/s10489-012-0346-z fatcat:lzs5cu53drevjhtqyjgekpppvy

Batched Thompson Sampling [article]

Cem Kalkanli, Ayfer Ozgur
2021 arXiv   pre-print
We introduce a novel anytime Batched Thompson sampling policy for multi-armed bandits where the agent observes the rewards of her actions and adjusts her policy only at the end of a small number of batches  ...  These results also indicate that Thompson sampling performs competitively with recently proposed algorithms tailored for the batched setting.  ...  Batched Thompson Sampling In this section, we describe our Batched Thompson sampling strategy for the batched multiarmed bandit setting described in the previous section.  ... 
arXiv:2110.00202v1 fatcat:lcs6fjb55bat3j647wacxovkxy

Adaptive Treatment Assignment in Experiments for Policy Choice

Maximilian Kasy, Anja Sautmann
2021 Econometrica  
Standard experimental designs are geared toward point estimation and hypothesis testing, while bandit algorithms are geared toward in‐sample outcomes.  ...  We prove an asymptotic optimality result for this algorithm and demonstrate improvements in welfare in calibrated simulations over both non‐adaptive designs and bandit algorithms.  ...  Below, we first briefly discuss one of the most popular (and oldest) bandit algorithms, so-called Thompson sampling, originally proposed by Thompson (1933) .  ... 
doi:10.3982/ecta17527 fatcat:l6zngf3il5d77n2c75dh73o7ee

Adaptive Combinatorial Allocation [article]

Maximilian Kasy, Alexander Teytelboym
2020 arXiv   pre-print
Our model covers two-sided and one-sided matching, even with complex constraints. We propose an approach based on Thompson sampling.  ...  Our main result is a prior-independent finite-sample bound on the expected regret for this algorithm.  ...  In order to form 1 − α credible sets for the parameters Θ j given the history F t , sample a large number of drawsΘ t from the posterior, and form a credible interval based on the α/2 and 1 − α/2 quantiles  ... 
arXiv:2011.02330v1 fatcat:dmmqyic43ffprbc7mqi75f5ymy
« Previous Showing results 1 — 15 out of 108 results