A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Semi-supervised learning of language model using unsupervised topic model
2010
2010 IEEE International Conference on Acoustics, Speech and Signal Processing
We present a semi-supervised learning (SSL) method for building domain-specific language models (LMs) from general-domain data using probabilistic latent semantic analysis (PLSA). The proposed technique first performs topic decomposition (TD) on the combined dataset of domain-specific and general-domain data. Then it derives latent topic distribution of the interested domain, and derives domain-specific word n-gram counts with a PLSA style mixture model. Finally, it uses traditional n-gram
doi:10.1109/icassp.2010.5494940
dblp:conf/icassp/BaiHML10
fatcat:nipc56dbdze2rdcus32fyhsrre