A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Thompson Sampling on Asymmetric α-Stable Bandits
[article]
2022
arXiv
pre-print
In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric α-stable distributions and explore their applications in modelling ...
Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws. ...
Based on the regret bound for symmetric α Thompson sampling, we develop a regret bound for asymmetric α one in the parameter, action and observation spaces. ...
arXiv:2203.10214v2
fatcat:vovkvfjztfdybc3jj7n4ke3dae
Best Arm Identification with a Fixed Budget under a Small Gap
[article]
2022
arXiv
pre-print
First, we derive a tight problem-dependent lower bound, which characterizes the optimal allocation ratio that depends on the gap of the expected rewards and the Fisher information of the bandit model. ...
We consider the fixed-budget best arm identification problem in the multi-armed bandit problem. ...
K.1 Two-armed Gaussian Bandits We compare the proposed RS-AIPW (RA) strategy with the alpha-elimination (Alpha, Kaufmann et al., 2014 Kaufmann et al., , 2016) ) and uniform sampling strategies (Uniform ...
arXiv:2201.04469v5
fatcat:hixw6qvsujbndksi7fnafxljai
Reinforcement Learning
[chapter]
2013
Neural Networks and Statistical Learning
For example, one approach is based on the theory of partially-observable MDPs (POMDPs). ...
Nevertheless, one can still attempt to construct a Markov state signal from the sequence of sensations. ...
It is sometimes called the theory of multi-stage decision processes, or sequential decision processes, and has roots in the statistical literature on sequential sampling beginning with the papers by Thompson ...
doi:10.1007/978-1-4471-5571-3_18
fatcat:l3xi3uyhtff35gai7qvvvedwbq
Reinforcement Learning
[chapter]
1997
Neural Systems for Control
For example, one approach is based on the theory of partially-observable MDPs (POMDPs). ...
Nevertheless, one can still attempt to construct a Markov state signal from the sequence of sensations. ...
It is sometimes called the theory of multi-stage decision processes, or sequential decision processes, and has roots in the statistical literature on sequential sampling beginning with the papers by Thompson ...
doi:10.1016/b978-012526430-3/50003-9
fatcat:nwyl3dsnrbc75csknm6u34nyty
Mixed-Variable Bayesian Optimization
[article]
2019
arXiv
pre-print
Thompson sampling. ...
Finally, we show that MiVaBo is significantly more sample efficient than state-of-the-art mixed-variable BO algorithms on hyperparameter tuning tasks. ...
Thompson sampling
for contextual bandits with linear payoffs. In Interna- Gardner, J. R., Kusner, M. J., Xu, Z., Weinberger, K. ...
arXiv:1907.01329v3
fatcat:srdpkrqzpvc4bjdypnlrzupxza
A Survey on Deep Reinforcement Learning for Data Processing and Analytics
[article]
2022
arXiv
pre-print
Motivated by this trend, we provide a comprehensive review of recent works focusing on utilizing DRL to improve data processing and analytics. ...
Next, we discuss DRL deployment on database systems, facilitating data processing and analytics in various aspects, including data organization, scheduling, tuning, and indexing. ...
Then it formulates the choosing task as a contextual multi-armed bandit problem and uses Thompson sampling [92] to solve it. Bao is a hybrid solution for query optimization. ...
arXiv:2108.04526v3
fatcat:kcusgp7jzfbf7ov5os7gwf2e6i
Deep Reinforcement Learning, a textbook
[article]
2022
arXiv
pre-print
The successes in research have not gone unnoticed by educators, and universities have started to offer courses on the subject. ...
Other popular approaches to add exploration are to add Dirichlet-noise [425] or to use Thompson sampling [770, 648] . ...
Alpha(Go) Zero thus learns starting at zero knowledge, tabula rasa. Self-play makes use of many reinforcement learning techniques. In order to ensure stable learning, exploration is important. ...
arXiv:2201.02135v2
fatcat:3icsopexerfzxa3eblpu5oal64
Deep Learning in Science
[article]
2020
arXiv
pre-print
Based on that sample, we document the DL diffusion process in the scientific system. ...
This paper provides insights on the diffusion and impact of DL in science. ...
All variables for the analysis are measured on the sampled papers. ...
arXiv:2009.01575v2
fatcat:4ttqgjdjfjbydp7flnhcgg5p7m
On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control
[article]
2021
arXiv
pre-print
Thus, we establish how the convergence rate to stationarity depends on the policy's tail index alpha, a Holder continuity parameter, integrability conditions, and an exploration tolerance parameter introduced ...
Due to its scaling to continuous spaces, we focus on policy search where one iteratively improves a parameterized policy with stochastic policy gradient (PG) updates. ...
Advances in neural information processing systems, 2019. [92] William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. ...
arXiv:2106.08414v1
fatcat:kzr546wrrvheniyjfdox5k2ksy
An Ordinal Agent Framework
2022
We present a preference-based approach leveraging dueling bandits to sequential decision problems and discuss its disadvantages in terms of sample efficiency and scalability. ...
We test this approach on multi-armed bandits, leverage it to Monte-Carlo tree search, and also apply it to reinforcement learning. ...
For example, one could use Thompson sampling to solve a bandit problem and identify the Borda winner this way. ...
doi:10.26083/tuprints-00019749
fatcat:qgl6tlir5zfx7k7vnfcyhatpji
A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges
[article]
2021
arXiv
pre-print
Balancing both exploration
and exploitation in different
complex domains
Deep Bayesian Bandits Showdown
using Thompson sampling
√
Pearce et al. [188]
2018
Exploration in RL
Confidence in ...
[446] introduced a novel approximate inference method based on the minimization of α-divergences termed as black-box alpha (BB-α). ...
[363] 2019 Image processing 2 CXPlain √ Continued on next page 54 ...
arXiv:2011.06225v4
fatcat:wwnl7duqwbcqbavat225jkns5u
Abstracts of Working Papers in Economics
1989
Abstracts of Working Papers in Economics
As in standard wage indexation models, agents are unable to filter out the separate influences of demand and supply shocks on the observed price, so that the optimal wage indexation parameter is a weighted ...
Moreover, one must expect that the parameters alpha and beta vary from subject to subject. ...
TI Some Results on Two-Armed Bandits When Both Projects Vary. AA Columbia University. ...
doi:10.1017/s0951007900001212
fatcat:zxs3iv43crff7jt2bmqq6xbwqy
Artificial and Computational Intelligence in Games: Integration (Dagstuhl Seminar 15051)
2015
Dagstuhl Reports
The focus of the seminar was on the computational techniques used to create, enhance, and improve the experiences of humans interacting with and within virtual environments. ...
I wasn't there for the final playoffs, but MCTS2 did seem very strong, and stable, in testing. ...
Bandit- based Search for Constraint Programming. In Proc. of the AAAI Workshop on COmbining COnstraint solving with MIning and Learning (COCOMILE), 2013. 3 Paige, B. and Wood, F. ...
doi:10.4230/dagrep.5.1.207
dblp:journals/dagstuhl-reports/LucasMPST15
fatcat:326hvjmzcndnzgd3jyk343sc2u
Causality and Generalizability: Identifiability and Learning Methods
[article]
2021
arXiv
pre-print
We show that recent research on distributionally robust prediction methods has connections to well-studied estimators from econometrics. ...
Acknowledgments We thank Phillip Bredahl Mogensen and Thomas Berrett for helpful discussions on the entropy score and its estimation. ...
PB and JP thank David Bürge and Jan Ernest for helpful discussions on exploiting Chu-Liu-Edmonds' algorithm for causal discovery during the early stages of this project. ...
arXiv:2110.01430v1
fatcat:c4w4wjt3wbfnhkyfcgflxaskye
5. Economic Determinants of Conflict and Fear
[chapter]
2017
Warlike and Peaceful Societies: The Interaction of Genes and Culture
(2010, p. 30) , Cashman (2013, chapter 7) 22 Cashman (2013, chapter 11) 23 Levy and Thompson (2010, p. 56) 24 Levy and Thompson (2010, p. 60) 4. ...
Of course, everything must have been on a smaller scale in prehistory, but even among chimpanzees and other social animals there is a large advantage to being the alpha male. 28 There are also costs ...
As with all Open Book publica� ons, this en� re book is available to read for free on the publisher's website. ...
doi:10.11647/obp.0128.05
fatcat:7m7u6lsuonebnajm4t6opzw5yy
« Previous
Showing results 1 — 15 out of 22 results