NAM-to-speech conversion with Gaussian mixture models

Tomoki Toda, Kiyohiro Shikano
2005 Interspeech 2005   unpublished
In order to realize a new human communication style using Non-Audible Murmur (NAM) that cannot be heard by people around a speaker, we perform conversion from NAM to ordinary speech (NAM-to-Speech). NAM-to-Speech has a possibility of realizing "non-speech telephone" that is a technique for communicating each other by talking in NAM and hearing in speech. In this paper, we apply a statistical conversion method with Gaussian Mixture Model (GMM) to NAM-to-Speech. In advance, we train GMMs for
more » ... senting correlations between acoustic features of NAM and those of speech using 50 utterance pairs of NAM and speech. In the conversion, we estimate acoustic spectral and F0 features of speech based on a maximum likelihood criterion, and then synthesize the converted speech with a vocoder. From results of subjective evaluations on intelligibility and naturalness, it is demonstrated that the NAMto-Speech with GMMs can convert NAM to more consistently natural voice.
doi:10.21437/interspeech.2005-611 fatcat:pqa5ltlfkfghdlm7is7zvh3ah4