Filters








698 Hits in 3.7 sec

Efficient Contextual Bandits in Non-stationary Worlds [article]

Haipeng Luo and Chen-Yu Wei and Alekh Agarwal and John Langford
2019 arXiv   pre-print
In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with sophisticated statistical tests so as to dynamically  ...  Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications.  ...  However, in many applications of contextual bandits, we are faced with an extremely non-stationary world.  ... 
arXiv:1708.01799v4 fatcat:2a2z5qnflbdp7cxugf5ri6hsti

Context Attentive Bandits: Contextual Bandit with Restricted Context [article]

Djallel Bouneffouf, Irina Rish, Guillermo A. Cecchi, Raphael Feraud
2017 arXiv   pre-print
We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every  ...  This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling.  ...  Contextual Bandit with Restricted Context in Non-stationary Environments In a stationary environment, the context vectors and the rewards are drawn from fixed probability distributions; the objective is  ... 
arXiv:1705.03821v2 fatcat:w4j5vvozlbghno7f2hxenrmqca

A Linear Bandit for Seasonal Environments [article]

Giuseppe Di Benedetto, Vito Bellini, Giovanni Zappella
2020 arXiv   pre-print
Contextual bandit algorithms are extremely popular and widely used in recommendation systems to provide online personalised recommendations.  ...  A recurrent assumption is the stationarity of the reward function, which is rather unrealistic in most of the real-world applications.  ...  SETTING We present a novel contextual bandit algorithm for non-stationary reward functions with seasonality.  ... 
arXiv:2004.13576v1 fatcat:k4cf436vfbaijgo2ofza2glyrq

Context Attentive Bandits: Contextual Bandit with Restricted Context

Djallel Bouneffouf, Irina Rish, Guillermo Cecchi, Raphaël Féraud
2017 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence  
We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every  ...  This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling.Herein, we adapt the standard multi-armed bandit algorithm known  ...  Contextual Bandit with Restricted Context in Non-stationary Environments In a stationary environment, the context vectors and the rewards are drawn from fixed probability distributions; the objective is  ... 
doi:10.24963/ijcai.2017/203 dblp:conf/ijcai/BouneffoufRCF17 fatcat:cp2ixhkmmjb3pn2chnba42uanm

A Survey on Practical Applications of Multi-Armed and Contextual Bandits [article]

Djallel Bouneffouf, Irina Rish
2019 arXiv   pre-print
In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance, due to its stellar  ...  This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit.  ...  For instance, the non-stationary contextual bandit could be useful in the non-stationary feature selection setting, where finding the right features is time-dependent and contextdependent when the environment  ... 
arXiv:1904.10040v1 fatcat:j6v37wy7f5bmvpfzzhtnutbeoa

A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit [article]

Giuseppe Burtini, Jason Loeppky, Ramon Lawrence
2015 arXiv   pre-print
Adaptive and sequential experiment design is a well-studied area in numerous domains.  ...  We survey and synthesize the work of the online statistical learning paradigm referred to as multi-armed bandits integrating the existing research as a resource for a certain class of online experiments  ...  a world where resistances, community bacterial loads and other factors may be evolving in an unmodelled or inherently non-stationary way.  ... 
arXiv:1510.00757v4 fatcat:eyxqdq3yl5fpdbv53wtnkfa25a

Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

Xiao Xu, Fang Dong, Yanghua Li, Shaojian He, Xin Li
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users.  ...  Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings.  ...  Conclusions and Future Work We studied a contextual bandit problem for personalized recommendation in a non-stationary environment.  ... 
doi:10.1609/aaai.v34i04.6125 fatcat:d3fjvknpgrhqdaesmko6inywba

Hyper-parameter Tuning for the Contextual Bandit [article]

Djallel Bouneffouf, Emmanuelle Claeys
2020 arXiv   pre-print
In the traditional algorithms that solve the contextual bandit problem, the exploration is a parameter that is tuned by the user.  ...  We study here the problem of learning the exploration exploitation trade-off in the contextual bandit problem with linear reward function setting.  ...  the following two senarios, one where we evaluate the algorithms in stationary environment, and the second with a non-stationary environment.  ... 
arXiv:2005.02209v1 fatcat:fqornlf4t5anvbsqomg5yvci2q

Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests [article]

Xiao Xu, Fang Dong, Yanghua Li, Shaojian He, Xin Li
2020 arXiv   pre-print
A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users.  ...  Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings.  ...  Conclusions and Future Work We studied a contextual bandit problem for personalized recommendation in a non-stationary environment.  ... 
arXiv:2003.00359v1 fatcat:zzs4qxkhv5b7vlftbaxlg5mohe

Guest editorial: special issue on reinforcement learning for real life

Yuxi Li, Alborz Geramifard, Lihong Li, Csaba Szepesvari, Tao Wang
2021 Machine Learning  
In the article titled "Inverse Reinforcement Learning in Contextual MDPs", the authors Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, and Tom Zahavy formulate the contextual inverse RL  ...  RL has seen prominent successes in many problems, such as Atari games, AlphaGo, robotics, recommender systems, and AutoML. However, applying RL in the real world remains challenging.  ...  off-policy techniques to train and evaluate a contextual bandit model for troubleshooting notification in a chatbot, considering a null action, a limited number of bandit arms, small data, reward design  ... 
doi:10.1007/s10994-021-06041-3 fatcat:ew3uhfhhevdd5khjeq2umhby7a

Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward [article]

Baihan Lin
2020 arXiv   pre-print
Our experiments on a variety of datasets, both in stationary and nonstationary environments of six different scenarios, demonstrated clear advantages of the proposed approach over the standard contextual  ...  bandit.  ...  Conclusion We introduced an extension of the contextual bandit problem, learning from episodically revealed reward, motivated by several real-world applications in non-stationary environments, including  ... 
arXiv:2009.08457v2 fatcat:vhvcravl2fg7dmyjoptfwsrnmu

Policy Gradients for Contextual Recommendations

Feiyang Pan, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, Qing He
2019 The World Wide Web Conference on - WWW '19  
PGCR can solve the standard contextual bandits as well as its Markov Decision Process generalization.  ...  We evaluate PGCR on toy datasets as well as a real-world dataset of personalized music recommendations.  ...  which results in lower sample efficiency.  ... 
doi:10.1145/3308558.3313616 dblp:conf/www/PanCTZH19 fatcat:ijck6gknk5dkfd5n2npsybbxmq

Autonomous Open-Ended Learning of Tasks with Non-Stationary Interdependencies [article]

Alejandro Romero, Gianluca Baldassarre, Richard J. Duro, Vieri Giuliano Santucci
2022 arXiv   pre-print
are non-stationary.  ...  of interdependent tasks, and even fewer tackled scenarios where goals involve non-stationary interdependencies.  ...  In a second experiment (Sec. 4.2) we test H-GRAIL in a similar robotic scenario, where interdependencies between goals are non-stationary (in particular, they change after a certain time during learning  ... 
arXiv:2205.07562v1 fatcat:wpq6akjmlzbw7hztxyxzcwvuya

A Map of Bandits for E-commerce [article]

Yi Liu, Lihong Li
2021 arXiv   pre-print
In this paper, we aim to reduce this gap with a structured map of Bandits to help practitioners navigate to find relevant and practical Bandit algorithms.  ...  While these are valuable resources, there exists a gap in mapping applications to appropriate Bandit algorithms.  ...  Offline evaluation for non-stationary Bandits remains challenging [47, 48] , with opportunities for further research.  ... 
arXiv:2107.00680v1 fatcat:7gl37h4yrrbfhdy4q5eyk7usbq

Adversarial Attacks on Linear Contextual Bandits [article]

Evrard Garcelon, Baptiste Roziere, Laurent Meunier, Jean Tarbouriech, Olivier Teytaud, Alessandro Lazaric, Matteo Pirotta
2020 arXiv   pre-print
Contextual bandit algorithms are applied in a wide range of domains, from advertising to recommender systems, from clinical trials to education.  ...  In this paper, we study several attack scenarios and show that a malicious agent can force a linear contextual bandit algorithm to pull any desired arm T - o(T) times over a horizon of T steps, while applying  ...  Contextual ACE transforms the original problem into a stationary bandit problem in which there is a targeted arm that is optimal for all contexts and all non targeted arms have expected reward of 0.  ... 
arXiv:2002.03839v3 fatcat:oweqjzh4erh7pfosmss2sovvcm
« Previous Showing results 1 — 15 out of 698 results