A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Contextual Multi-Armed Bandits for Causal Marketing
[article]
2018
arXiv
pre-print
This work explores the idea of a causal contextual multi-armed bandit approach to automated marketing, where we estimate and optimize the causal (incremental) effects. ...
Our approach draws on strengths of causal inference, uplift modeling, and multi-armed bandits. ...
Algorithm 2 Thompson Sampling based Contextual Multi-Armed Bandits with Online Scoring and Batch TrainingInitialization: Time t = 0; event log L = {}; d- dimensional bandit arm contextual distribution ...
arXiv:1810.01859v1
fatcat:ixqhpj2wkzextkxq2d4oj7jdgy
Treatment effect optimisation in dynamic environments
2022
Journal of Causal Inference
Incorporating this target creates a causal model which we name an uplifted contextual multi-armed bandit. ...
Applying causal methods to fields such as healthcare, marketing, and economics receives increasing interest. ...
Acknowledgments: We wish to thank Vincent Ginis, Ann Nowé, Judea Pearl and our reviewers for their helpful comments. Funding information: JB is funded by the W.D. Armstrong Trust Fund. ...
doi:10.1515/jci-2020-0009
fatcat:tyt3pmevgbhl5iduomcz54eszu
Optimising Individual-Treatment-Effect Using Bandits
[article]
2019
arXiv
pre-print
To counter this, we propose the uplifted contextual multi-armed bandit (U-CMAB), a novel approach to optimise the ITE by drawing upon bandit literature. ...
Take for example the negative influence on a marketing campaign when a competitor product is released. ...
Contextual multi-armed bandits (CMAB) differ from UM as they apply treatment in function of expected response only. ...
arXiv:1910.07265v1
fatcat:s2w2pqv55beqrp6rvtvwbsla64
Bandit Algorithms for Precision Medicine
[article]
2021
arXiv
pre-print
Since precision medicine focuses on the use of patient characteristics to guide treatment, contextual bandit algorithms are especially useful since they are designed to take such information into account ...
The Oxford English Dictionary defines precision medicine as "medical care designed to optimize efficiency or therapeutic benefit for particular groups of patients, especially by using genetic or molecular ...
Multi-armed Bandit In recent years, the multi-armed bandit (MAB) framework has attracted a lot of attention in many application areas such as healthcare, marketing, and recommendation systems. ...
arXiv:2108.04782v1
fatcat:dni5wyzyerestgs3upuzz776n4
A Causal Approach to Prescriptive Process Monitoring
2021
International Conference on Business Process Management
Contextual bandits are an extension of multi-armed bandits. They output an action conditional on the state of the environment. ...
In particular, contextual bandits algorithms have benefited from the causal inference literature to make them less prone to problems in estimation bias [15] . ...
dblp:conf/bpm/Bozorgi21
fatcat:lvxhrpdxdjglzavhhbh3rrsvxq
A Map of Bandits for E-commerce
[article]
2021
arXiv
pre-print
The rich body of Bandit literature not only offers a diverse toolbox of algorithms, but also makes it hard for a practitioner to find the right solution to solve the problem at hand. ...
In this paper, we aim to reduce this gap with a structured map of Bandits to help practitioners navigate to find relevant and practical Bandit algorithms. ...
Best-arm Identification In some bandit applications, our goal is not to maximize reward during an experiment, but to identify the best action (e.g., best marketing campaign strategy) at the end of the ...
arXiv:2107.00680v1
fatcat:7gl37h4yrrbfhdy4q5eyk7usbq
VacSIM: Learning Effective Strategies for COVID-19 Vaccine Distribution using Reinforcement Learning
[article]
2021
arXiv
pre-print
We approach this problem by proposing a novel pipeline VacSIM that dovetails Deep Reinforcement Learning models into a Contextual Bandits approach for optimizing the distribution of COVID-19 vaccine. ...
Whereas the Reinforcement Learning models suggest better actions and rewards, Contextual Bandits allow online modifications that may need to be implemented on a day-to-day basis in the real world scenario ...
Tavpritesh Sethi and the Center for Artificial Intelligence at IIIT-Delhi. ...
arXiv:2009.06602v3
fatcat:2yfa3xapyna5lnlryuq6237uae
Multi-armed bandit experiments in the online service economy
2015
Applied Stochastic Models in Business and Industry
This article briefly summarizes mulit-armed bandit experiments, where the experimental design is modified as the experiment progresses to reduce the cost of experimenting. ...
Contextual information The multi-armed bandit can be sensitive to the assumed model for the rewards distribution. ...
Multi-armed bandit experiments A multi-armed bandit is a sequential experiment where the goal is to produce the largest reward. In the typical setup there are K actions or "arms." ...
doi:10.1002/asmb.2104
fatcat:c23qh6fznfddhimr2he7qheyta
AutoML for Contextual Bandits
[article]
2022
arXiv
pre-print
Contextual Bandits is one of the widely popular techniques used in applications such as personalization, recommendation systems, mobile health, causal marketing etc . ...
We propose an end to end automated meta-learning pipeline to approximate the optimal Q function for contextual bandits problems. ...
It is an extension of the multi-armed bandit problem [14] , generalizing it with the concept of a context. ...
arXiv:1909.03212v2
fatcat:scixdaefijbupj4nw7wcdsesva
Uplift Modeling for Multiple Treatments with Cost Optimization
[article]
2020
arXiv
pre-print
It can be used for optimizing the performance of interventions such as marketing campaigns and product designs. ...
An important but so far neglected use case for uplift modeling is an experiment with multiple treatment groups that have different costs, such as for example when different communication channels and promotion ...
for collaboration on use cases. ...
arXiv:1908.05372v3
fatcat:h4v2daihonejpnzo7fs6ppnp3e
Rate-Optimal Contextual Online Matching Bandit
[article]
2022
arXiv
pre-print
Existing works focus on multi-armed bandit with static preference, but this is insufficient: the two-sided preference changes as along as one-side's contextual information updates, resulting in non-static ...
This motivates us to consider a novel Contextual Online Matching Bandit prOblem (COMBO), which allows dynamic preferences in matching decisions. ...
However, these works do not consider the arms' contextual information and hence are not capable of tackling our dynamic matching problem. Centralized Multi-Agent Bandit for Matching. ...
arXiv:2205.03699v1
fatcat:zn5e42g3dnhgljyup4k6l5gk3a
Reinforcement Learning in Practice: Opportunities and Challenges
[article]
2022
arXiv
pre-print
Then we discuss opportunities of RL, in particular, products and services, games, bandits, recommender systems, robotics, transportation, finance and economics, healthcare, education, combinatorial optimization ...
These are developed in the settings of multi-armed bandits, but are applicable to RL problems. We discuss bandits in Section 3.3. ...
Contextual bandits are a "mature" technique that can be widely applied. RL is "mature" for many single-, two-, and multi-player games. ...
arXiv:2202.11296v2
fatcat:xdtsmme22rfpfn6rgfotcspnhy
Optimizing peer referrals for public awareness using contextual bandits
2019
Proceedings of the Conference on Computing & Sustainable Societies - COMPASS '19
Given the lack of initial information about the social network or how people respond to referral incentives, we use an explore-exploit strategy and present a contextual bandit agent CoBBI that optimizes ...
With a fixed budget for referral incentives, a natural goal for such referral programs is to maximize the number of people reached. ...
This strategy, known as an ϵ-greedy multi-armed bandit, works well for a variety of decision optimization problems [16] . ...
doi:10.1145/3314344.3332497
dblp:conf/dev/MothilalYS19
fatcat:e3n2fjkttveodp67ji3kj6ve3a
Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs
[article]
2021
arXiv
pre-print
Finally, we demonstrate the performance advantages of our algorithm on large-scale bandit and traffic intersection problems, providing a novel contribution to the latter in the form of a spatial approximation ...
Factored policy gradients (FPGs), which follow, provide a common framework for analysing key state-of-the-art algorithms, are shown to generalise traditional policy gradients, and yield a principled way ...
Acknowledgments and Disclosure of Funding The authors would like to acknowledge our colleagues Joshua Lockhart, Jason Long and Rui Silva for their input and suggestions at various key stages of the research ...
arXiv:2102.10362v3
fatcat:l7vtjc7kanfk7dgxnp2c3jc4oq
Efficient Counterfactual Learning from Bandit Feedback
2019
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. ...
What is the most statistically efficient way to do off-policy optimization with batch data from bandit feedback? ...
We are grateful to seminar participants at ICML/IJCAI/AAMAS Workshop "Machine Learning for Causal Inference, Counterfactual Prediction, and Autonomous Action (CausalML)" and RIKEN Center for Advanced Intelligence ...
doi:10.1609/aaai.v33i01.33014634
fatcat:lyqkmh3t45ailcpmkyeczlfaiu
« Previous
Showing results 1 — 15 out of 536 results