Filters








12 Hits in 4.3 sec

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems [article]

Zachary C. Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng
2017 arXiv   pre-print
We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems.  ...  Our algorithm learns much faster than common exploration strategies such as ϵ-greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones.  ...  See appendix for more details. BBQ-networks We are now ready to introduce BBQN, our algorithm for learning dialogue policies with deep learning models.  ... 
arXiv:1608.05081v4 fatcat:3meqci2hnbekdjspzwfw4nplbq

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems [article]

Zachary Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng
2017 arXiv   pre-print
We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems.  ...  Our algorithm learns much faster than common exploration strategies such as \epsilon-greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones.  ...  See Appendix Bayes-by-Backprop for more details. BBQ-networks We are now ready to introduce BBQN, our algorithm for learning dialogue policies with deep learning models.  ... 
arXiv:1711.05715v2 fatcat:bfswvf466fdnhaoxec2asstetm

Neural Approaches to Conversational AI

Jianfeng Gao, Michel Galley, Lihong Li
2018 The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval - SIGIR '18  
us/research/publication/neural-approaches-toconversational-ai/ We thank Lihong Li, Bill Dolan and Yun-Nung (Vivian) Chen for contributing slides. 2  ...  : • How to explore efficiently • to collect data for new slots • When deep models are used Bayes-by-Backprop Q (BBQ) network [Lipton+ 18] BBQ-learning of network params = , 2 : = arg min KL  ...  • Deep RL for dialogue policy learning • Building dialog systems via machine learning and machine teaching • Part 4: Fully data-driven conversation models and chatbots An Example Dialogue with Movie-Bot  ... 
doi:10.1145/3209978.3210183 dblp:conf/sigir/GaoG018 fatcat:pnhrb5jgdfgnxac3hxy52a65pm

What's to know? Uncertainty as a Guide to Asking Goal-oriented Questions [article]

Ehsan Abbasnejad, Qi Wu, Javen Shi, Anton van den Hengel
2018 arXiv   pre-print
We evaluate our approach on two goal-oriented dialogue datasets, one for visual-based collaboration task and the other for a negotiation-based task.  ...  We propose a solution to this problem based on a Bayesian model of the uncertainty in the implicit model maintained by the visual dialogue agent, and in the function used to select an appropriate output  ...  Early goal-oriented dialogue systems [33, 36] RL in dialogue generation Reinforcement learning (RL) has been applied in many dialogue settings. Li et al.  ... 
arXiv:1812.06401v1 fatcat:wykt2pj455espe3kvxpx6v3z7i

Show Us the Way: Learning to Manage Dialog from Demonstrations [article]

Gabriel Gordon-Hall, Philip John Gorinski, Gerasimos Lampouras, Ignacio Iacobacci
2020 arXiv   pre-print
At the core of our system is a reinforcement learning algorithm which uses Deep Q-learning from Demonstrations to learn a dialog policy with the help of expert examples.  ...  Our proposed dialog system adopts a pipeline architecture, with distinct components for Natural Language Understanding, Dialog State Tracking, Dialog Management and Natural Language Generation.  ...  Multiwoz - a large-scale multi-domain wizard-of-oz dataset for task- L. 2018. Bbq-networks: Efficient exploration in deep re- oriented dialogue modelling.  ... 
arXiv:2004.08114v1 fatcat:ps57ibfqufg4hhkeuqmvksd77m

A User Simulator for Task-Completion Dialogues [article]

Xiujun Li, Zachary C. Lipton, Bhuwan Dhingra, Lihong Li, Jianfeng Gao, Yun-Nung Chen
2017 arXiv   pre-print
Despite widespread interests in reinforcement-learning for task-oriented dialogue systems, several obstacles can frustrate research and development progress.  ...  Third, collecting and annotating human-machine or human-human conversations for task-oriented dialogues requires extensive domain knowledge.  ...  In Interspeech, 2016. Zachary C Lipton, Jianfeng Gao, Lihong Li, Xiujun Li, Faisal Ahmed, and Li Deng. Efficient exploration for dialogue policy learning with BBQ networks & replay buffer spiking.  ... 
arXiv:1612.05688v3 fatcat:5rqtmsuyzzainhajzga7uza4qi

Recent Advances and Challenges in Task-oriented Dialog System [article]

Zheng Zhang, Ryuichi Takanobu, Qi Zhu, Minlie Huang, Xiaoyan Zhu
2020 arXiv   pre-print
We also discuss three critical topics for task-oriented dialog systems: (1) improving data efficiency to facilitate dialog modeling in low-resource settings, (2) modeling multi-turn dynamics for dialog  ...  In this paper, we survey recent advances and challenges in task-oriented dialog systems.  ...  For example, improved RL methods including ACER [93] and BBQ-Networks [36] are proposed to enhance sample efficiency.  ... 
arXiv:2003.07490v3 fatcat:powcuixxargkbp57kpwmjict3y

Deep Reinforcement Learning [article]

Yuxi Li
2018 arXiv   pre-print
We discuss deep reinforcement learning in an overview style. We draw a big picture, filled with details.  ...  We start with background of artificial intelligence, machine learning, deep learning, and reinforcement learning (RL), with resources.  ...  Lanctot et al. (2017) observe that independent RL, in which each agent learns by interacting with the environment, oblivious to other agents, can overfit the learned policies to other agents' policies  ... 
arXiv:1810.06339v1 fatcat:kp7atz5pdbeqta352e6b3nmuhy

Randomized Value Functions via Multiplicative Normalizing Flows [article]

Ahmed Touati, Harsh Satija, Joshua Romoff, Joelle Pineau, Pascal Vincent
2019 arXiv   pre-print
Randomized value functions offer a promising approach towards the challenge of efficient exploration in complex environments with high dimensional state and action spaces.  ...  In this work, we leverage recent advances in variational Bayesian neural networks and combine these with traditional Deep Q-Networks (DQN) and Deep Deterministic Policy Gradient (DDPG) to achieve randomized  ...  We also note that BBQN was proposed for Task-Oriented Dialogue Systems and was not evaluated on standard RL benchmarks. Furthermore, BBQN can be seen simply as a sub-case of MNF-DQN.  ... 
arXiv:1806.02315v3 fatcat:6nbtlhjvo5e4pbzojhvudaroje

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [article]

Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh
2021 arXiv   pre-print
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.  ...  In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations  ...  Bbq-networks: Efficient exploration in deep reinforce- ment learning for task-oriented dialogue systems. In Proceedings of the AAAI Conference on Artificial Intelli- gence, volume 32, 2018.  ... 
arXiv:2105.08140v1 fatcat:pn4jm6vgifemvfryytwi5m75c4

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning [article]

David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek
2019 arXiv   pre-print
Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning.  ...  Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure.  ...  Acknowledgements We thank Matej Balog and the anonymous reviewers for their helpful comments and suggestions. Jiri Hron acknowledges support by a Nokia CASE Studentship.  ... 
arXiv:1810.06530v5 fatcat:v5hr3hie5ffkfpoctnc3qrc4eq

Neural Contextual Bandits with UCB-based Exploration [article]

Dongruo Zhou and Lihong Li and Quanquan Gu
2020 arXiv   pre-print
) of reward for efficient exploration.  ...  We also show the algorithm is empirically competitive against representative baselines in a number of benchmarks.  ...  Acknowledgement We would like to thank the anonymous reviewers for their helpful comments. This research was sponsored in part by the National Science Foundation IIS-1904183 and IIS-1906169.  ... 
arXiv:1911.04462v3 fatcat:3u6erwajyfbnxpn3wvehrk6j3u