Filters








23 Hits in 6.6 sec

Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation [article]

Kai Wang, Zhene Zou, Qilin Deng, Runze Wu, Jianrong Tao, Changjie Fan, Liang Chen, Peng Cui
2021 arXiv   pre-print
We develop a model-based reinforcement learning framework, called GoalRec.  ...  In recent years, there are great interests as well as challenges in applying reinforcement learning (RL) to recommendation systems (RS).  ...  Figure 1 : 1 (a) The traditional reinforcement learning framework. (b) Figure 2 : 2 The definition of measurement for a typical item recommender system.  ... 
arXiv:2104.02981v2 fatcat:ktmwxxix25clhkke5gzxbwi5pu

A Survey on Reinforcement Learning for Recommender Systems [article]

Yuanguo Lin, Yong Liu, Fan Lin, Lixin Zou, Pengcheng Wu, Wenhua Zeng, Huanhuan Chen, Chunyan Miao
2022 arXiv   pre-print
In particular, Reinforcement Learning (RL) based recommender systems have become an emerging research topic in recent years.  ...  Empirical results show that RL-based recommendation methods often surpass most of supervised learning methods, owing to the interactive nature and autonomous learning ability.  ...  Furthermore, Pseudo Dyna-Q (PDQ) [57] is proposed to ensure the stability of convergence and low computation cost of existing algorithms.  ... 
arXiv:2109.10665v2 fatcat:wx5ghn66hzg7faxee54jf7gspq

DARES: An Asynchronous Distributed Recommender System using Deep Reinforcement Learning

Bichen Shi, Elias Z. Tragos, Makbule Gulcin Ozsoy, Ruihai Dong, Neil Hurley, Barry Smyth, Aonghus Lawlor
2021 IEEE Access  
Overview of the privacy-preserving distributed deep reinforcement learning (DARES) framework.  ...  INDEX TERMS Recommender systems, reinforcement learning, distributed learning, click through ratio. FIGURE 1.  ...  Later, they proposed Pseudo Dyna-Q (PDQ) [62] which is based on Dyna-Q [30] , [40] .  ... 
doi:10.1109/access.2021.3087406 fatcat:ybn7dqamtjdzjgsavctwvotdcq

An Empirical Study on Deep Neural Network Models for Chinese Dialogue Generation

Zhe Li, Mieradilijiang Maimaiti, Jiabao Sheng, Zunwang Ke, Wushour Silamu, Qinyong Wang, Xiuhong Li
2020 Symmetry  
models that are based on the symmetrical architecture of Seq2Seq, RNNSearch, transformer, generative adversarial nets, and reinforcement learning respectively.  ...  Their performances were evaluated by four widely-used metrics in this area: BLEU, pseudo, distinct, and rouge.  ...  Acknowledgments: We thank the anonymous reviewers for their valuable feedback. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/sym12111756 fatcat:5zfwgtsstvgldi3bxl4236xywq

Reinforcement learning based recommender systems: A survey [article]

M. Mehdi Afsar, Trafford Crump, Behrouz Far
2022 arXiv   pre-print
In this paper, a survey on reinforcement learning based recommender systems (RLRSs) is presented.  ...  Therefore, it can be formulated as a Markov decision process (MDP) and be solved by reinforcement learning (RL) algorithms.  ...  ACKNOWLEDGEMENTS We wish to thank the anonymous reviewers for their constructional comments on the first versions of this paper.  ... 
arXiv:2101.06286v2 fatcat:alfslgagzvek5gx5kfepxc7xae

Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning

Tulika Saha, Sriparna Saha, Pushpak Bhattacharyya, Haoran Xie
2020 PLoS ONE  
In order to integrate these multiple aspects, a Hierarchical Reinforcement Learning (HRL) specifically options based VA is proposed to learn strategies for managing multi-intent conversations.  ...  As a first step towards enabling the development of sentiment aided VA for multi-intent conversations, this paper proposes a new dataset, annotated with its corresponding intents, slot and sentiment (considering  ...  In [31] , authors presented yet another variant of Deep Dyna-Q framework [28] called Budget-Conscious Scheduling-based (BCS) Deep Dyna-Q to best utilize a fixed, small number of human interactions (  ... 
doi:10.1371/journal.pone.0235367 pmid:32614929 fatcat:7yd5c4mxtfgylpuib2lxqutaia

A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions [article]

Xiaocong Chen, Lina Yao, Julian McAuley, Guanglin Zhou, Xianzhi Wang
2021 arXiv   pre-print
of the recent trends of deep reinforcement learning in recommender systems.  ...  In light of the emergence of deep reinforcement learning (DRL) in recommender systems research and several fruitful results in recent years, this survey aims to provide a timely and comprehensive overview  ...  Pseudo Dyna-Q (PDQ) [135] points out that Monte-Carlo tree search may lead to an extremely large action space and an unbounded importance weight of training samples.  ... 
arXiv:2109.03540v2 fatcat:5gwrbfcj3rc7jfkd54eseck5ga

Intelligent scaling for 6G IoE services for resource provisioning

Abdullah Alharbi, Hashem Alyami, Poongodi M, Hafiz Tayyab Rauf, Seifedine Kadry
2021 PeerJ Computer Science  
This research used IScaler, an effective model for intelligent service placement solutions and resource scaling. IScaler is considered to be made for MEC in Deep Reinforcement Learning (DRL).  ...  The paper has considered several requirements for making service placement decisions.  ...  The different matrices are used for the case of Dyna-Q, and Fig. 12 shows that a significant change and a dynamic environment can be observed for the case of resources that are primarily available for  ... 
doi:10.7717/peerj-cs.755 pmid:34805508 pmcid:PMC8576555 fatcat:p2byn3cyzvgmpbzfj6mpw3fx4a

A Survey of Exploration Methods in Reinforcement Learning [article]

Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup
2021 arXiv   pre-print
Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process as the lack of enough information could hinder effective learning.  ...  In this article, we provide a survey of modern exploration methods in (Sequential) reinforcement learning, as well as a taxonomy of exploration methods.  ...  Sutton (1991a) shows that in both experiments, Dyna-Q+ outperforms other variations of Dyna-Q in the setting, where the performance is measured with respect to the collected reward.  ... 
arXiv:2109.00157v2 fatcat:dlqhzwxscnfbxpt2i6rp7ovp6i

Deep Reinforcement Learning: An Overview [article]

Yuxi Li
2018 arXiv   pre-print
Please see Deep Reinforcement Learning, arXiv:1810.06339, for a significant update.  ...  We start with background of machine learning, deep learning and reinforcement learning.  ...  See classical Dyna-Q (Sutton, 1990 ). Weber et al. (2017) Andrychowicz et al.  ... 
arXiv:1701.07274v6 fatcat:x2es3yf3crhqblbbskhxelxf2q

Model-Based Reinforcement Learning for Atari [article]

Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker (+1 others)
2020 arXiv   pre-print
Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations.  ...  However, this typically requires very large amounts of interaction -- substantially more, in fact, than a human would need to learn the same games. How can people learn so quickly?  ...  ACKNOWLEDGMENTS We thank Marc Bellemare and Pablo Castro for their help with Rainbow and Dopamine.  ... 
arXiv:1903.00374v4 fatcat:y6o3luqnxbhvfdc4l7yiabufni

Deep Reinforcement Learning [article]

Yuxi Li
2018 arXiv   pre-print
We discuss deep reinforcement learning in an overview style. We draw a big picture, filled with details.  ...  We start with background of artificial intelligence, machine learning, deep learning, and reinforcement learning (RL), with resources.  ...  Lanctot et al. (2017) observe that independent RL, in which each agent learns by interacting with the environment, oblivious to other agents, can overfit the learned policies to other agents' policies  ... 
arXiv:1810.06339v1 fatcat:kp7atz5pdbeqta352e6b3nmuhy

Risk-Aware Model-Based Control

Chen Yu, Andre Rosendo
2021 Frontiers in Robotics and AI  
In comparison with other state-of-the-art reinforcement learning algorithms, we show that it produces superior results on a walking robot model.  ...  Model-Based Reinforcement Learning (MBRL) algorithms have been shown to have an advantage on data-efficiency, but often overshadowed by state-of-the-art model-free methods in performance, especially when  ...  The original Dyna-Q algorithm (Sutton, 1990 ) use a model to have a better Q function estimation based on Q-learning method.  ... 
doi:10.3389/frobt.2021.617839 pmid:33778013 pmcid:PMC7990789 fatcat:v4thq6253zgjpde6zhowd6ufca

A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity [article]

Pablo Hernandez-Leal, Michael Kaisers, Tim Baarslag, Enrique Munoz de Cote
2019 arXiv   pre-print
This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits.  ...  The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target  ...  Acknowledgments We would like to thank Frans Oliehoek and Daan Bloembergen for useful discussions and suggestions.  ... 
arXiv:1707.09183v2 fatcat:mnducjpn7zawpnw3u6wnhhc6k4

Deep Reinforcement Learning, a textbook [article]

Aske Plaat
2022 arXiv   pre-print
The aim of this book is to provide a comprehensive overview of the field of deep reinforcement learning.  ...  The book is written for graduate students of artificial intelligence, and for researchers and practitioners who wish to better understand deep reinforcement learning methods and their challenges.  ...  Dyna-Q uses the Q-function Algorithm 5.3 Dyna-Q [741] Initialize 𝑄 (𝑠, 𝑎) → R randomly Initialize 𝑀 (𝑠, 𝑎) → R × 𝑆 randomly ⊲ Model repeat Select 𝑠 ∈ 𝑆 randomly 𝑎 ← 𝜋 (𝑠) ⊲ 𝜋 (𝑠) 𝑄 ( ŝ,  ... 
arXiv:2201.02135v2 fatcat:3icsopexerfzxa3eblpu5oal64
« Previous Showing results 1 — 15 out of 23 results