Sparse nonlinear representation for voice conversion

Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki
2015 2015 IEEE International Conference on Multimedia and Expo (ICME)  
In voice conversion, sparse-representation-based methods have recently been garnering attention because they are, relatively speaking, not affected by over-fitting or over-smoothing problems. In these approaches, voice conversion is achieved by estimating a sparse vector that determines which dictionaries of the target speaker should be used, calculated from the matching of the input vector and dictionaries of the source speaker. The sparse-representation-based voice conversion methods can be
more » ... oadly divided into two approaches: 1) an approach that uses raw acoustic features in the training data as parallel dictionaries, and 2) an approach that trains parallel dictionaries from the training data. In our approach, we follow the latter approach and systematically estimate the parallel dictionaries using a joint-density restricted Boltzmann machine with sparse constraints. Through voiceconversion experiments, we confirmed the high-performance of our method, comparing it with the conventional Gaussian mixture model (GMM)-based approach, and a non-negative matrix factorization (NMF)-based approach, which is based on sparse representation.
doi:10.1109/icme.2015.7177437 dblp:conf/icmcs/NakashikaTA15 fatcat:r5bn6swx4bgnjeoc5arofxsnky