A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
Recent works on voice conversion (VC) focus on preserving the rhythm and the intonation as well as the linguistic content. To preserve these features from the source, we decompose current non-parallel VC systems into two encoders and one decoder. We analyze each module with several experiments and reassemble the best components to propose Assem-VC, a new state-of-the-art any-to-many non-parallel VC system. We also examine that PPG and Cotatron features are speaker-dependent, and attempt toarXiv:2104.00931v2 fatcat:yxvyss4zireutmcquc4pkmpniy