A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards
[article]
2019
arXiv
pre-print
Training chatbots using the reinforcement learning paradigm is challenging due to high-dimensional states, infinite action spaces and the difficulty in specifying the reward function. We address such problems using clustered actions instead of infinite actions, and a simple but promising reward function based on human-likeness scores derived from human-human dialogue data. We train Deep Reinforcement Learning (DRL) agents using chitchat data in raw text---without any manual annotations.
arXiv:1908.10331v1
fatcat:jtgyrwly4jfaxfeb7u5mbbxz7a