A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
On-line policy optimisation of Bayesian spoken dialogue systems via human interaction
2013
2013 IEEE International Conference on Acoustics, Speech and Signal Processing
A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating the use of a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However,
doi:10.1109/icassp.2013.6639297
dblp:conf/icassp/GasicBHKSTTY13
fatcat:fus6y7a6uneinbtiq6ie4qstty