A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis
[article]
2021
arXiv
pre-print
To scale neural speech synthesis to various real-world languages, we present a multilingual end-to-end framework that maps byte inputs to spectrograms, thus allowing arbitrary input scripts. Besides strong results on 40+ languages, the framework demonstrates capabilities to adapt to new languages under extreme low-resource and even few-shot scenarios of merely 40s transcribed recording, without the need of per-language resources like lexicon, extra corpus, auxiliary models, or linguistic
arXiv:2103.03541v2
fatcat:z7xtjh723rey3gyrzhrrepj3ea