Filters








22 Hits in 3.9 sec

Thompson Sampling on Asymmetric α-Stable Bandits [article]

Zhendong Shi, Ercan E. Kuruoglu, Xiaoli Wei
2022 arXiv   pre-print
In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric α-stable distributions and explore their applications in modelling  ...  Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws.  ...  Based on the regret bound for symmetric α Thompson sampling, we develop a regret bound for asymmetric α one in the parameter, action and observation spaces.  ... 
arXiv:2203.10214v2 fatcat:vovkvfjztfdybc3jj7n4ke3dae

Best Arm Identification with a Fixed Budget under a Small Gap [article]

Masahiro Kato and Kaito Ariu and Masaaki Imaizumi and Masatoshi Uehara and Masahiro Nomura and Chao Qin
2022 arXiv   pre-print
First, we derive a tight problem-dependent lower bound, which characterizes the optimal allocation ratio that depends on the gap of the expected rewards and the Fisher information of the bandit model.  ...  We consider the fixed-budget best arm identification problem in the multi-armed bandit problem.  ...  K.1 Two-armed Gaussian Bandits We compare the proposed RS-AIPW (RA) strategy with the alpha-elimination (Alpha, Kaufmann et al., 2014 Kaufmann et al., , 2016) ) and uniform sampling strategies (Uniform  ... 
arXiv:2201.04469v5 fatcat:hixw6qvsujbndksi7fnafxljai

Reinforcement Learning [chapter]

Ke-Lin Du, M. N. S. Swamy
2013 Neural Networks and Statistical Learning  
For example, one approach is based on the theory of partially-observable MDPs (POMDPs).  ...  Nevertheless, one can still attempt to construct a Markov state signal from the sequence of sensations.  ...  It is sometimes called the theory of multi-stage decision processes, or sequential decision processes, and has roots in the statistical literature on sequential sampling beginning with the papers by Thompson  ... 
doi:10.1007/978-1-4471-5571-3_18 fatcat:l3xi3uyhtff35gai7qvvvedwbq

Reinforcement Learning [chapter]

Andrew G. Barto
1997 Neural Systems for Control  
For example, one approach is based on the theory of partially-observable MDPs (POMDPs).  ...  Nevertheless, one can still attempt to construct a Markov state signal from the sequence of sensations.  ...  It is sometimes called the theory of multi-stage decision processes, or sequential decision processes, and has roots in the statistical literature on sequential sampling beginning with the papers by Thompson  ... 
doi:10.1016/b978-012526430-3/50003-9 fatcat:nwyl3dsnrbc75csknm6u34nyty

Mixed-Variable Bayesian Optimization [article]

Erik Daxberger, Anastasia Makarova, Matteo Turchetta, Andreas Krause
2019 arXiv   pre-print
Thompson sampling.  ...  Finally, we show that MiVaBo is significantly more sample efficient than state-of-the-art mixed-variable BO algorithms on hyperparameter tuning tasks.  ...  Thompson sampling for contextual bandits with linear payoffs. In Interna- Gardner, J. R., Kusner, M. J., Xu, Z., Weinberger, K.  ... 
arXiv:1907.01329v3 fatcat:srdpkrqzpvc4bjdypnlrzupxza

A Survey on Deep Reinforcement Learning for Data Processing and Analytics [article]

Qingpeng Cai, Can Cui, Yiyuan Xiong, Wei Wang, Zhongle Xie, Meihui Zhang
2022 arXiv   pre-print
Motivated by this trend, we provide a comprehensive review of recent works focusing on utilizing DRL to improve data processing and analytics.  ...  Next, we discuss DRL deployment on database systems, facilitating data processing and analytics in various aspects, including data organization, scheduling, tuning, and indexing.  ...  Then it formulates the choosing task as a contextual multi-armed bandit problem and uses Thompson sampling [92] to solve it. Bao is a hybrid solution for query optimization.  ... 
arXiv:2108.04526v3 fatcat:kcusgp7jzfbf7ov5os7gwf2e6i

Deep Reinforcement Learning, a textbook [article]

Aske Plaat
2022 arXiv   pre-print
The successes in research have not gone unnoticed by educators, and universities have started to offer courses on the subject.  ...  Other popular approaches to add exploration are to add Dirichlet-noise [425] or to use Thompson sampling [770, 648] .  ...  Alpha(Go) Zero thus learns starting at zero knowledge, tabula rasa. Self-play makes use of many reinforcement learning techniques. In order to ensure stable learning, exploration is important.  ... 
arXiv:2201.02135v2 fatcat:3icsopexerfzxa3eblpu5oal64

Deep Learning in Science [article]

Stefano Bianchini, Moritz Müller, Pierre Pelletier
2020 arXiv   pre-print
Based on that sample, we document the DL diffusion process in the scientific system.  ...  This paper provides insights on the diffusion and impact of DL in science.  ...  All variables for the analysis are measured on the sampled papers.  ... 
arXiv:2009.01575v2 fatcat:4ttqgjdjfjbydp7flnhcgg5p7m

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control [article]

Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel
2021 arXiv   pre-print
Thus, we establish how the convergence rate to stationarity depends on the policy's tail index alpha, a Holder continuity parameter, integrability conditions, and an exploration tolerance parameter introduced  ...  Due to its scaling to continuous spaces, we focus on policy search where one iteratively improves a parameterized policy with stochastic policy gradient (PG) updates.  ...  Advances in neural information processing systems, 2019. [92] William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples.  ... 
arXiv:2106.08414v1 fatcat:kzr546wrrvheniyjfdox5k2ksy

An Ordinal Agent Framework

Tobias Joppen
2022
We present a preference-based approach leveraging dueling bandits to sequential decision problems and discuss its disadvantages in terms of sample efficiency and scalability.  ...  We test this approach on multi-armed bandits, leverage it to Monte-Carlo tree search, and also apply it to reinforcement learning.  ...  For example, one could use Thompson sampling to solve a bandit problem and identify the Borda winner this way.  ... 
doi:10.26083/tuprints-00019749 fatcat:qgl6tlir5zfx7k7vnfcyhatpji

A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges [article]

Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, Xiaochun Cao, Abbas Khosravi, U Rajendra Acharya, Vladimir Makarenkov, Saeid Nahavandi
2021 arXiv   pre-print
Balancing both exploration and exploitation in different complex domains Deep Bayesian Bandits Showdown using Thompson sampling √ Pearce et al. [188] 2018 Exploration in RL Confidence in  ...  [446] introduced a novel approximate inference method based on the minimization of α-divergences termed as black-box alpha (BB-α).  ...  [363] 2019 Image processing 2 CXPlain √ Continued on next page 54  ... 
arXiv:2011.06225v4 fatcat:wwnl7duqwbcqbavat225jkns5u

Abstracts of Working Papers in Economics

1989 Abstracts of Working Papers in Economics  
As in standard wage indexation models, agents are unable to filter out the separate influences of demand and supply shocks on the observed price, so that the optimal wage indexation parameter is a weighted  ...  Moreover, one must expect that the parameters alpha and beta vary from subject to subject.  ...  TI Some Results on Two-Armed Bandits When Both Projects Vary. AA Columbia University.  ... 
doi:10.1017/s0951007900001212 fatcat:zxs3iv43crff7jt2bmqq6xbwqy

Artificial and Computational Intelligence in Games: Integration (Dagstuhl Seminar 15051)

Simon M. Lucas, Michael Mateas, Mike Preuss, Pieter Spronck, Julian Togelius, Marc Herbstritt
2015 Dagstuhl Reports  
The focus of the seminar was on the computational techniques used to create, enhance, and improve the experiences of humans interacting with and within virtual environments.  ...  I wasn't there for the final playoffs, but MCTS2 did seem very strong, and stable, in testing.  ...  Bandit- based Search for Constraint Programming. In Proc. of the AAAI Workshop on COmbining COnstraint solving with MIning and Learning (COCOMILE), 2013. 3 Paige, B. and Wood, F.  ... 
doi:10.4230/dagrep.5.1.207 dblp:journals/dagstuhl-reports/LucasMPST15 fatcat:326hvjmzcndnzgd3jyk343sc2u

Causality and Generalizability: Identifiability and Learning Methods [article]

Martin Emil Jakobsen
2021 arXiv   pre-print
We show that recent research on distributionally robust prediction methods has connections to well-studied estimators from econometrics.  ...  Acknowledgments We thank Phillip Bredahl Mogensen and Thomas Berrett for helpful discussions on the entropy score and its estimation.  ...  PB and JP thank David Bürge and Jan Ernest for helpful discussions on exploiting Chu-Liu-Edmonds' algorithm for causal discovery during the early stages of this project.  ... 
arXiv:2110.01430v1 fatcat:c4w4wjt3wbfnhkyfcgflxaskye

5. Economic Determinants of Conflict and Fear [chapter]

Agner Fog
2017 Warlike and Peaceful Societies: The Interaction of Genes and Culture  
(2010, p. 30) , Cashman (2013, chapter 7) 22 Cashman (2013, chapter 11) 23 Levy and Thompson (2010, p. 56) 24 Levy and Thompson (2010, p. 60) 4.  ...  Of course, everything must have been on a smaller scale in prehistory, but even among chimpanzees and other social animals there is a large advantage to being the alpha male. 28 There are also costs  ...  As with all Open Book publica� ons, this en� re book is available to read for free on the publisher's website.  ... 
doi:10.11647/obp.0128.05 fatcat:7m7u6lsuonebnajm4t6opzw5yy
« Previous Showing results 1 — 15 out of 22 results