4,395 Hits in 4.3 sec

Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design [article]

Yufei Ruan, Jiaqi Yang, Yuan Zhou
2021 arXiv   pre-print
Along the way, we propose the distributional optimal design, a natural extension of the optimal experiment design, and provide a both statistically and computationally efficient learning algorithm for  ...  We consider two popular limited adaptivity models in literature: batch learning and rare policy switches.  ...  Acknowledgments We thank Yanjun Han, Zhengyuan Zhou, and Zhengqing Zhou for their valuable comments.  ... 
arXiv:2007.01980v3 fatcat:tfttx2fc6neajjw3xw7zhcs2zu

Constrained Contextual Bandit Learning for Adaptive Radar Waveform Selection [article]

Charles E. Thornton, R. Michael Buehrer, Anthony F. Martone
2021 arXiv   pre-print
A sequential decision process in which an adaptive radar system repeatedly interacts with a finite-state target channel is studied.  ...  Stochastic and adversarial linear contextual bandit models are introduced, allowing the radar to achieve effective performance in broad classes of physical environments.  ...  Stochastic Linear Contextual Bandits and Thompson Sampling We first examine a stochastic linear contextual bandit learning model, under which the cost at each PRI is characterized by the following structure  ... 
arXiv:2103.05541v2 fatcat:bvjexjasbvfbfbdobr4olshequ

Introduction to Multi-Armed Bandits

Aleksandrs Slivkins
2019 Foundations and Trends® in Machine Learning  
"Stochastic Linear Optimization under Bandit Feedback". In: 21th Conf. on Learning Theory (COLT). 355-366. Daskalakis, C., A. Deckelbaum, and A. Kim. 2015.  ...  Bandit-like designs for medical trials belong to the realm of adaptive medical trials (Chow and Chang, 2008) , which can also include other "adaptive" features such as early stopping, sample size re-estimation  ... 
doi:10.1561/2200000068 fatcat:5drse7hks5fuzd6hriwrlgp27a

A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit [article]

Giuseppe Burtini, Jason Loeppky, Ramon Lawrence
2015 arXiv   pre-print
Adaptive and sequential experiment design is a well-studied area in numerous domains.  ...  We survey and synthesize the work of the online statistical learning paradigm referred to as multi-armed bandits integrating the existing research as a resource for a certain class of online experiments  ...  adaptive adversary, the lower-and upper-bounds are linear in H.  ... 
arXiv:1510.00757v4 fatcat:eyxqdq3yl5fpdbv53wtnkfa25a

Bandit Theory: Applications to Learning Healthcare Systems and Clinical Trials

Michael Sklar, Mei-Chiung Shih, Philip Lavori
2021 Statistica sinica  
In recent years, statisticians and clinical scientists have defined two new approaches for studying the effects of medical practice, extending the "gold standard" classical randomized clinical trial to  ...  Chakraborty and Murphy (2014) discuss non-regular asymptotics for Q-learning with the linear model.  ...  Adaptive Randomization in a Learning Healthcare System In an LHS, the arms of a multi-armed bandit are treatments and the rewards are patient outcomes.  ... 
doi:10.5705/ss.202020.0431 fatcat:flvhdv52ejfw3grajbprbssdxe

Bandit Algorithms in Information Retrieval

Dorota Glowacka
2019 Foundations and Trends in Information Retrieval  
Dorota Głowacka (2019), "Bandit Algorithms in Information Retrieval", Foundations and Trends R in Information Retrieval: Vol. 13, No. 4, pp 299-424. DOI: 10.1561/1500000067.  ...  "Online linear optimization and adaptive routing". Journal of Computer and System Sciences. 74(1): 97-114. Bouneffouf, D., A. Bouzeghoub, and A. L. Gançarski. 2012.  ...  Chapter 2 is primarily aimed at readers with little knowledge of reinforcement learning and bandits.  ... 
doi:10.1561/1500000067 fatcat:api5ljs5abbwdckujtsgwp27o4

Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation [article]

Artem Sokolov and Stefan Riezler and Tanguy Urvoy
2016 arXiv   pre-print
Our experiments show that our approach improves translation quality and is comparable to approaches that employ more informative feedback in learning.  ...  structure, is observed in learning.  ...  Acknowledgements This research was supported in part by DFG grant RI-2221/2-1 "Grounding Statistical Machine Translation in Perception and Action".  ... 
arXiv:1601.04468v1 fatcat:ljm5me33qfaexeqngr26m6yzqu

A Survey on Practical Applications of Multi-Armed and Contextual Bandits [article]

Djallel Bouneffouf, Irina Rish
2019 arXiv   pre-print
performance combined with certain attractive properties, such as learning from less feedback.  ...  The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit  ...  While recent approaches use Bayesian optimization to adaptively select optimal hyperparameter configurations, they rather focus on speeding up random search through adaptive resource allocation and early-stopping  ... 
arXiv:1904.10040v1 fatcat:j6v37wy7f5bmvpfzzhtnutbeoa

Contextual Bandits for adapting to changing User preferences over time [article]

Dattaraj Rao
2020 arXiv   pre-print
First, we make available a simplified simulated dataset showing varying user preferences over time and how this can be evaluated with static and dynamic learning algorithms.  ...  This dataset made available as part of this research is intentionally simulated with limited number of features and can be used to evaluate different problem-solving strategies.  ...  We started with a simple synthetic dataset and showed limitations of static learners to adapt to changing user preferences.  ... 
arXiv:2009.10073v2 fatcat:dxg4vd6xvbeklmtpgqumynex5e

Data-driven software design with Constraint Oriented Multi-variate Bandit Optimization (COMBO)

Rasmus Ros, Mikael Hammar
2020 Empirical Software Engineering  
COMBO has several implementations of machine learning algorithms and constraint solvers to optimize the model with user data by software developers without deep optimization knowledge.  ...  Context Software design in e-commerce can be improved with user data through controlled experiments (i.e. A/B tests) to better meet user needs.  ...  Acknowledgements Thanks to Per Runeson, Elizabeth Bjarnason, Luigi Nardi, and the three anonymous  ... 
doi:10.1007/s10664-020-09856-1 fatcat:ym7nvhzogjhnbm5qxxzonu7si4

Bayesian Contextual Bandits for Hyper Parameter Optimization

Guoxin Sui, Yong Yu
2020 IEEE Access  
INDEX TERMS Automated machine learning, hyper parameter optimization, contextual bandits.  ...  However, these approaches impose a strong prior assumption on the distribution of learning curves and involve much larger computational complexity or rely heavily on predefined rules, which is not general  ...  adaptation of bandits strategies for balancing exploration and exploitation.  ... 
doi:10.1109/access.2020.2977129 fatcat:wwls7ues5zgxjovxqchd5bzs5y

Data Poisoning Attacks on Stochastic Bandits [article]

Fang Liu, Ness Shroff
2019 arXiv   pre-print
Our adaptive attack strategy can hijack the behavior of the bandit algorithm to suffer a linear regret with only a logarithmic cost to the attacker.  ...  Stochastic multi-armed bandits form a class of online learning problems that have important applications in online recommendation systems, adaptive medical treatment, and many others.  ...  with limited feedback.  ... 
arXiv:1905.06494v1 fatcat:uhmscq7d2fhmjjxse7zcm2raam

Bandit Data-Driven Optimization [article]

Zheyuan Ryan Shi, Zhiwei Steven Wu, Rayid Ghani, Fei Fang
2022 arXiv   pre-print
Bandit data-driven optimization combines the advantages of online bandit learning and offline predictive analytics in an integrated framework.  ...  Applications of machine learning in the non-profit and public sectors often feature an iterative workflow of data acquisition, prediction, and optimization of interventions.  ...  The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the o cial policies, either expressed or implied, of the funding agencies.  ... 
arXiv:2008.11707v2 fatcat:af3omqsag5acdafy7xu7jhutka

Learning Sequential Channel Selection for Interference Alignment using Reconfigurable Antennas [article]

Nikhil Gulati, Rohit Bahl, Kapil R. Dandekar
2019 arXiv   pre-print
We show that by using an adaptive sequential learning policy, each node in the network can learn to select optimal channels without requiring full and instantaneous CSI for all the available antenna states  ...  Traditional ML based approach or optimization is often not suitable due to algorithmic complexity, reliance on existing training data and/or due to distributed setting.  ...  On the other hand, random selection policy has almost linear regret with respect to time showing its sub optimality.  ... 
arXiv:1712.06181v3 fatcat:6xl55ocgqvbllo574hdkhvy7da

Failure is Not an Option: Policy Learning for Adaptive Recovery in Space Operations

Steve McGuire, P. Michael Furlong, Christoffer Heckman, Simon Julier, Daniel Szafir, Nisar Ahmed
2018 IEEE Robotics and Automation Letters  
The contextual bandit outperforms conventional static policies and non-contextual learning approaches, and also demonstrates favorable robustness and scaling properties.  ...  The contextual bandits exploit information from observed environment and assistant performance variables to efficiently learn selection policies under a wide set of uncertain operating conditions and unknown  ...  The new optimal assistance allocation framework leverages contextual multiarmed bandit reinforcement learning algorithms.  ... 
doi:10.1109/lra.2018.2801468 dblp:journals/ral/McGuireFHJSA18 fatcat:jdesdvmppfg5vpfrftgm6x26bq
« Previous Showing results 1 — 15 out of 4,395 results