A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Efficient Contextual Bandits in Non-stationary Worlds
[article]
2019
arXiv
pre-print
In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with sophisticated statistical tests so as to dynamically ...
Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications. ...
However, in many applications of contextual bandits, we are faced with an extremely non-stationary world. ...
arXiv:1708.01799v4
fatcat:2a2z5qnflbdp7cxugf5ri6hsti
Context Attentive Bandits: Contextual Bandit with Restricted Context
[article]
2017
arXiv
pre-print
We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every ...
This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling. ...
Contextual Bandit with Restricted Context in Non-stationary Environments In a stationary environment, the context vectors and the rewards are drawn from fixed probability distributions; the objective is ...
arXiv:1705.03821v2
fatcat:w4j5vvozlbghno7f2hxenrmqca
A Linear Bandit for Seasonal Environments
[article]
2020
arXiv
pre-print
Contextual bandit algorithms are extremely popular and widely used in recommendation systems to provide online personalised recommendations. ...
A recurrent assumption is the stationarity of the reward function, which is rather unrealistic in most of the real-world applications. ...
SETTING We present a novel contextual bandit algorithm for non-stationary reward functions with seasonality. ...
arXiv:2004.13576v1
fatcat:k4cf436vfbaijgo2ofza2glyrq
Context Attentive Bandits: Contextual Bandit with Restricted Context
2017
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every ...
This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling.Herein, we adapt the standard multi-armed bandit algorithm known ...
Contextual Bandit with Restricted Context in Non-stationary Environments In a stationary environment, the context vectors and the rewards are drawn from fixed probability distributions; the objective is ...
doi:10.24963/ijcai.2017/203
dblp:conf/ijcai/BouneffoufRCF17
fatcat:cp2ixhkmmjb3pn2chnba42uanm
A Survey on Practical Applications of Multi-Armed and Contextual Bandits
[article]
2019
arXiv
pre-print
In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance, due to its stellar ...
This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. ...
For instance, the non-stationary contextual bandit could be useful in the non-stationary feature selection setting, where finding the right features is time-dependent and contextdependent when the environment ...
arXiv:1904.10040v1
fatcat:j6v37wy7f5bmvpfzzhtnutbeoa
A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit
[article]
2015
arXiv
pre-print
Adaptive and sequential experiment design is a well-studied area in numerous domains. ...
We survey and synthesize the work of the online statistical learning paradigm referred to as multi-armed bandits integrating the existing research as a resource for a certain class of online experiments ...
a world where resistances, community bacterial loads and other factors may be evolving in an unmodelled or inherently non-stationary way. ...
arXiv:1510.00757v4
fatcat:eyxqdq3yl5fpdbv53wtnkfa25a
Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. ...
Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings. ...
Conclusions and Future Work We studied a contextual bandit problem for personalized recommendation in a non-stationary environment. ...
doi:10.1609/aaai.v34i04.6125
fatcat:d3fjvknpgrhqdaesmko6inywba
Hyper-parameter Tuning for the Contextual Bandit
[article]
2020
arXiv
pre-print
In the traditional algorithms that solve the contextual bandit problem, the exploration is a parameter that is tuned by the user. ...
We study here the problem of learning the exploration exploitation trade-off in the contextual bandit problem with linear reward function setting. ...
the following two senarios, one where we evaluate the algorithms in stationary environment, and the second with a non-stationary environment. ...
arXiv:2005.02209v1
fatcat:fqornlf4t5anvbsqomg5yvci2q
Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests
[article]
2020
arXiv
pre-print
A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. ...
Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings. ...
Conclusions and Future Work We studied a contextual bandit problem for personalized recommendation in a non-stationary environment. ...
arXiv:2003.00359v1
fatcat:zzs4qxkhv5b7vlftbaxlg5mohe
Guest editorial: special issue on reinforcement learning for real life
2021
Machine Learning
In the article titled "Inverse Reinforcement Learning in Contextual MDPs", the authors Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, and Tom Zahavy formulate the contextual inverse RL ...
RL has seen prominent successes in many problems, such as Atari games, AlphaGo, robotics, recommender systems, and AutoML. However, applying RL in the real world remains challenging. ...
off-policy techniques to train and evaluate a contextual bandit model for troubleshooting notification in a chatbot, considering a null action, a limited number of bandit arms, small data, reward design ...
doi:10.1007/s10994-021-06041-3
fatcat:ew3uhfhhevdd5khjeq2umhby7a
Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward
[article]
2020
arXiv
pre-print
Our experiments on a variety of datasets, both in stationary and nonstationary environments of six different scenarios, demonstrated clear advantages of the proposed approach over the standard contextual ...
bandit. ...
Conclusion We introduced an extension of the contextual bandit problem, learning from episodically revealed reward, motivated by several real-world applications in non-stationary environments, including ...
arXiv:2009.08457v2
fatcat:vhvcravl2fg7dmyjoptfwsrnmu
Policy Gradients for Contextual Recommendations
2019
The World Wide Web Conference on - WWW '19
PGCR can solve the standard contextual bandits as well as its Markov Decision Process generalization. ...
We evaluate PGCR on toy datasets as well as a real-world dataset of personalized music recommendations. ...
which results in lower sample efficiency. ...
doi:10.1145/3308558.3313616
dblp:conf/www/PanCTZH19
fatcat:ijck6gknk5dkfd5n2npsybbxmq
Autonomous Open-Ended Learning of Tasks with Non-Stationary Interdependencies
[article]
2022
arXiv
pre-print
are non-stationary. ...
of interdependent tasks, and even fewer tackled scenarios where goals involve non-stationary interdependencies. ...
In a second experiment (Sec. 4.2) we test H-GRAIL in a similar robotic scenario, where interdependencies between goals are non-stationary (in particular, they change after a certain time during learning ...
arXiv:2205.07562v1
fatcat:wpq6akjmlzbw7hztxyxzcwvuya
A Map of Bandits for E-commerce
[article]
2021
arXiv
pre-print
In this paper, we aim to reduce this gap with a structured map of Bandits to help practitioners navigate to find relevant and practical Bandit algorithms. ...
While these are valuable resources, there exists a gap in mapping applications to appropriate Bandit algorithms. ...
Offline evaluation for non-stationary Bandits remains challenging [47, 48] , with opportunities for further research. ...
arXiv:2107.00680v1
fatcat:7gl37h4yrrbfhdy4q5eyk7usbq
Adversarial Attacks on Linear Contextual Bandits
[article]
2020
arXiv
pre-print
Contextual bandit algorithms are applied in a wide range of domains, from advertising to recommender systems, from clinical trials to education. ...
In this paper, we study several attack scenarios and show that a malicious agent can force a linear contextual bandit algorithm to pull any desired arm T - o(T) times over a horizon of T steps, while applying ...
Contextual ACE transforms the original problem into a stationary bandit problem in which there is a targeted arm that is optimal for all contexts and all non targeted arms have expected reward of 0. ...
arXiv:2002.03839v3
fatcat:oweqjzh4erh7pfosmss2sovvcm
« Previous
Showing results 1 — 15 out of 698 results