2,742 Hits in 6.4 sec

Efficient Contextual Bandits with Continuous Actions [article]

Maryam Majzoubi, Chicheng Zhang, Rajan Chari, Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins
2020 arXiv   pre-print
We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure.  ...  Our reduction-style algorithm composes with most supervised learning representations. We prove that it works in a general sense and verify the new functionality with large-scale experiments.  ...  Broader Impact Our study of efficient contextual bandits with continuous actions can be applied to a wide range of applications, such as precision medicine, personalized recommendations, data center optimization  ... 
arXiv:2006.06040v2 fatcat:fetngpm5wnhrlp3zaby7bjh22y

Training a Quantum Neural Network to Solve the Contextual Multi-Armed Bandit Problem

Wei Hu, James Hu
2019 Natural Science  
In this work, we employ machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning  ...  Therefore, a continuous-variable (CV) quantum architecture based on a photonic quantum computing model is selected for our study.  ...  Contextual Multi-Armed Bandit Problem In this problem, there are four bandits, with each having four arms in this study.  ... 
doi:10.4236/ns.2019.111003 fatcat:xhhlnid4ofdbtosllopeppdzyu

Deep Contextual Multi-armed Bandits [article]

Mark Collier, Hector Urdiales Llorens
2018 arXiv   pre-print
bandits, and 3) fixed dropout rate deep contextual bandits.  ...  Contextual multi-armed bandit problems arise frequently in important industrial applications.  ...  Both contextual and non-contextual bandits involve making a sequence of decisions on which action to take from an action space A.  ... 
arXiv:1807.09809v1 fatcat:btkzma64dne3ro65ltqyifg44u

AutoML for Contextual Bandits [article]

Praneet Dutta, Joe Cheuk, Jonathan S Kim, Massimo Mascaro
2022 arXiv   pre-print
We propose an end to end automated meta-learning pipeline to approximate the optimal Q function for contextual bandits problems.  ...  Contextual Bandits is one of the widely popular techniques used in applications such as personalization, recommendation systems, mobile health, causal marketing etc .  ...  INTRODUCTION Contextual Bandits is a class of dynamic algorithms which can be used to learn efficiently targeting strategies.  ... 
arXiv:1909.03212v2 fatcat:scixdaefijbupj4nw7wcdsesva

A Survey on Practical Applications of Multi-Armed and Contextual Bandits [article]

Djallel Bouneffouf, Irina Rish
2019 arXiv   pre-print
performance combined with certain attractive properties, such as learning from less feedback.  ...  The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit  ...  covariates, and propose a novel and efficient bandit algorithm based on the LASSO estimator.  ... 
arXiv:1904.10040v1 fatcat:j6v37wy7f5bmvpfzzhtnutbeoa

Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks [article]

Rong Zhu, Mattia Rigotti
2021 arXiv   pre-print
In this paper we introduce Sample Average Uncertainty (SAU), a simple and efficient uncertainty measure for contextual bandits.  ...  Because of its simplicity SAU can be seamlessly applied to deep contextual bandits as a very scalable drop-in replacement for epsilon-greedy exploration.  ...  Bandit Algorithms Deep contextual bandits refers to tackling contextual bandits by parameterizing the action-value function as a deep neural network µ(x, θ), thereby leveraging models that have been very  ... 
arXiv:2105.04683v2 fatcat:5zwuywor3zezhm6yk2mdxbfyyu

Scalable Thompson Sampling via Optimal Transport [article]

Ruiyi Zhang, Zheng Wen, Changyou Chen, Lawrence Carin
2019 arXiv   pre-print
Consequently, efficient computation of an approximate posterior distribution is a crucial problem for scalable TS with complex models, such as neural networks.  ...  Based on the framework, a principled particle-optimization algorithm is developed for TS to approximate the posterior efficiently.  ...  We consider a contextual bandit with k = 8 arms and d = 10 dimensional contexts.  ... 
arXiv:1902.07239v1 fatcat:guwosx3wyfdwxffixlqtsv3lbu

Contextual Exploration Using a Linear Approximation Method Based on Satisficing [article]

Akane Minami, Yu Kono, Tatsuji Takahashi
2021 arXiv   pre-print
The results of our experiments indicate that LinRS reduced the number of explorations and run time compared to those of existing algorithms in contextual bandit problems.  ...  LinRS utilizes linear regression and multiclass classification to linearly approximate both the action value and proportion of action selections required in the RS calculation.  ...  Contextual bandit algorithms learn the estimated parameters of the action value for the feature vectors.  ... 
arXiv:2112.06452v1 fatcat:nt6mnjckqzdh5gcnjjdsdyhlma

contextual: Evaluating Contextual Multi-Armed Bandit Problems in R [article]

Robin van Emden, Maurits Kaptein
2020 arXiv   pre-print
, easily extensible framework that facilitates parallelized comparison of contextual and context-free bandit policies through both simulation and offline analysis.  ...  Over the past decade, contextual bandit algorithms have been gaining in popularity due to their effectiveness and flexibility in solving sequential decision problems---from online advertising and finance  ...  Implemented policies and bandits Though contextual was designed to make it easy to develop custom bandit and policy classes, it is also possible to run basic simulations with just its built-in bandits  ... 
arXiv:1811.01926v4 fatcat:7im2nngh7jb2zk4vboyicqatra

A Deep Bayesian Bandits Approach for Anticancer Therapy: Exploration via Functional Prior [article]

Mingyu Lu and Yifang Chen and Su-In Lee
2022 arXiv   pre-print
To address this challenge, we formulate drug screening study as a "contextual bandit" problem, in which an algorithm selects anticancer therapeutics based on contextual information about cancer cell lines  ...  Learning personalized cancer treatment with machine learning holds great promise to improve cancer patients' chance of survival.  ...  To enable efficient exploration in a contextual bandit setting with drug compound features and gene expression, we use functional variational approach to approximate this pharmacogenomics posterior.  ... 
arXiv:2205.02944v1 fatcat:tlo4fapelrfwzimetq7nycneg4

A Map of Bandits for E-commerce [article]

Yi Liu, Lihong Li
2021 arXiv   pre-print
In this paper, we aim to reduce this gap with a structured map of Bandits to help practitioners navigate to find relevant and practical Bandit algorithms.  ...  Instead of providing a comprehensive overview, we focus on a small number of key decision points related to reward, action, and features, which often affect how Bandit algorithms are chosen in practice  ...  Feature Engineering In many applications, we use features to deal with large context or action sets more efficiently.  ... 
arXiv:2107.00680v1 fatcat:7gl37h4yrrbfhdy4q5eyk7usbq

Adaptive Metamorphic Testing with Contextual Bandits [article]

Helge Spieker, Arnaud Gotlieb
2020 arXiv   pre-print
By using contextual bandits, Adaptive Metamorphic Testing learns which metamorphic relations are likely to transform a source test case, such that it has higher chance to discover faults.  ...  In this article, we propose Adaptive Metamorphic Testing as a generalization of a simple yet powerful reinforcement learning technique, namely contextual bandits, to select one of the multiple metamorphic  ...  Contextual Bandits The selection of a test transformation to apply on a source test case is formalized as a multi-armed bandit problem with context information, also known as a contextual bandit [12,  ... 
arXiv:1910.00262v3 fatcat:6y25b53dlffqxovcsyret7snqe

Online Learning in Contextual Bandits using Gated Linear Networks [article]

Eren Sezener, Marcus Hutter, David Budden, Jianan Wang, Joel Veness
2020 arXiv   pre-print
We introduce a new and completely online contextual bandit algorithm called Gated Linear Contextual Bandits (GLCB).  ...  We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems.  ...  Discussion We have introduced a new algorithm for both the discrete and continuous contextual bandits setting.  ... 
arXiv:2002.11611v2 fatcat:vm65osogrrh2vbreila2kvrk3a

Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow [article]

Wenjun Zeng, Yi Liu
2022 arXiv   pre-print
In the simulation study, we observe the proposed MDP with Bandits algorithm outperforms Q-learning with ϵ-greedy and decreasing ϵ, independent Bandits, and interaction Bandits.  ...  The way that we formulate the problem allows us to leverage TS's efficiency in balancing exploration and exploitation and Bandit's convenience in modeling actions' incompatibility.  ...  On the other hand, bandit algorithms with TS generally outperform Q-learning with 𝜖-greedy, given its efficiency in exploration.  ... 
arXiv:2107.00204v2 fatcat:ipvwfr4z4jdnfdsh5yc3pmy7dm

Greybox fuzzing as a contextual bandits problem [article]

Ketan Patil, Aditya Kanade
2018 arXiv   pre-print
We formalize this problem as a 'contextual bandit problem' and we propose an algorithm to solve this problem. We have implemented our approach on top of the AFL.  ...  We fuzz the substring with this new energy value and continuously updates the policy based upon the interesting test cases it produces on fuzzing.  ...  [30] Contextual Bandits Problem The Contextual Bandits problem falls in-between the full reinforcement learning problem and multi-armed bandits problem.  ... 
arXiv:1806.03806v1 fatcat:n3ll3kfpx5di5dnlglcwwbwa44
« Previous Showing results 1 — 15 out of 2,742 results