Filters








36 Hits in 2.6 sec

Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed

Wei Song, Guanghui Xu, Zhengchen Zhang, Chao Zhang, Xiaodong He, Bowen Zhou
<span title="2020-10-25">2020</span> <i title="ISCA"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/trpytsxgozamtbp7emuvz2ypra" style="color: black;">Interspeech 2020</a> </i> &nbsp;
In this paper, we propose Efficient WaveGlow (EWG), a flowbased generative model serving as an efficient neural vocoder.  ...  To reduce the number of model parameters and enhance the speed without sacrificing the quality of the synthesized speech, EWG improves WaveGlow in three aspects.  ...  Efficient WaveGlow EWG follows the normalizing flow strucutre of Glow with an improved transform network showed in Fig. 1 . Three modifications of the transform network are proposed in this paper.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2020-2172">doi:10.21437/interspeech.2020-2172</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/interspeech/SongXZZ0Z20.html">dblp:conf/interspeech/SongXZZ0Z20</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6mnww6x7vjcn7doosqj4dwvdum">fatcat:6mnww6x7vjcn7doosqj4dwvdum</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201210054310/https://www.isca-speech.org/archive/Interspeech_2020/pdfs/2172.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/2a/7b/2a7b6e86fd67cc92e6c44f20bd85576071ef0e26.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2020-2172"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Mathematical Vocoder Algorithm : Modified Spectral Inversion for Efficient Neural Speech Synthesis [article]

Hyun Gon Ryu, Jeong-Hoon Kim, Simon See
<span title="2021-06-16">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
The main benefit of using our proposed method is that it excludes the training stage of the neural vocoder from the end-to-end speech synthesis model.  ...  In this work, we propose a new mathematical vocoder algorithm(modified spectral inversion) that generates a waveform from acoustic features without phase estimation.  ...  In the future, we plan to improve performance through parallel work. The algorithm 3 with large hop length can synthesize speech at 59.6 MHz with 1024/1022.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.03167v2">arXiv:2106.03167v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/5tckueeirvadbktdo75r2cpr6m">fatcat:5tckueeirvadbktdo75r2cpr6m</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210618073945/https://arxiv.org/pdf/2106.03167v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/39/3a/393ad1ce9e1ac04a046d25e1133d798ea2fb5748.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.03167v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Distilling the Knowledge from Conditional Normalizing Flows [article]

Dmitry Baranchuk, Vladimir Aliev, Artem Babenko
<span title="2021-08-05">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
However, they have to be carefully designed to represent invertible functions with efficient Jacobian determinant calculation.  ...  In this work, we investigate whether one can distill flow-based models into more efficient alternatives.  ...  The closest works to ours distill knowledge from an expensive autoregressive neural vocoder to normalizing flows with parallel inference.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.12699v3">arXiv:2106.12699v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/7fn3hnwtyrbqbhdvkozp2rprfm">fatcat:7fn3hnwtyrbqbhdvkozp2rprfm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210710043516/https://arxiv.org/pdf/2106.12699v2.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/d2/99/d299365b4894e78e7483cc91df4b48fb3e8f88a9.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.12699v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

PPSpeech: Phrase based Parallel End-to-End TTS System [article]

Yahuan Cong, Ran Zhang, Jian Luan
<span title="2020-08-06">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
The speed advantage increases with the growth of sentence length.  ...  By this method, we can achieve both high quality and high efficiency.  ...  WaveGlow vocoder WaveGlow is a NN-based vocoder that produces audio by sampling from a distribution.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2008.02490v1">arXiv:2008.02490v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mmwl6xe43rh6znvx6wqdqkeo2m">fatcat:mmwl6xe43rh6znvx6wqdqkeo2m</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200818041257/https://arxiv.org/ftp/arxiv/papers/2008/2008.02490.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2008.02490v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition [article]

Zhengxi Liu, Yanmin Qian
<span title="2021-06-25">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
While GAN based neural vocoders have shown to be computationally much more efficient than those based on autoregressive predictions, the real-time generation of the highest quality audio on CPU is still  ...  Recent studies have shown that neural vocoders based on generative adversarial network (GAN) can generate audios with high quality.  ...  While these neural vocoders based on parallelization significantly improve the inference speed, these improvements are only applicable when the model inferences on GPU.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.13419v1">arXiv:2106.13419v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/f2rhfgz4trffpkr7p2l6e3ao5e">fatcat:f2rhfgz4trffpkr7p2l6e3ao5e</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210630070643/https://arxiv.org/pdf/2106.13419v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/29/b2/29b22b8620772db73b9746b26849d7be803e2c63.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.13419v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction [article]

Qiao Tian, Zewang Zhang, Heng Lu, Ling-Hui Chen, Shan Liu
<span title="2020-09-03">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Therefore, it can significantly improve the efficiency of speech synthesis. The proposed model with 4 sub-bands needs less than 1.6 GFLOPS for speech generation.  ...  The LPCNet, a recently proposed neural vocoder which utilized the linear predictive characteristic of speech signal in the WaveRNN architecture, can generate high quality speech with a speed faster than  ...  These two members in Intel not only provided the guidance on how to get good performance on the Intel(R) Xeon(R) Scalable Processors, but also helped to optimize/validate our algorithm with Intel(R) Deep  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.05551v2">arXiv:2005.05551v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ztn6ustma5b6zfiucp46ysxpie">fatcat:ztn6ustma5b6zfiucp46ysxpie</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200918074121/https://arxiv.org/pdf/2005.05551v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.05551v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Evaluation of Tacotron Based Synthesizers for Spanish and Basque

Víctor García, Inma Hernáez, Eva Navas
<span title="">2022</span> <i title="MDPI AG"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/smrngspzhzce7dy6ofycrfxbim" style="color: black;">Applied Sciences</a> </i> &nbsp;
The system applies Tacotron 2 to compute mel-spectrograms from the input sequence, followed by WaveGlow as neural vocoder to obtain the audio signals from the spectrograms.  ...  To mitigate the problem, we implemented a guided attention providing the system with the explicit duration of the phonemes.  ...  This method improves the model stability while also enhances the training efficiency.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/app12031686">doi:10.3390/app12031686</a> <a target="_blank" rel="external noopener" href="https://doaj.org/article/9f0e9e5fd8c4447db8dc6c2b4bf67d2e">doaj:9f0e9e5fd8c4447db8dc6c2b4bf67d2e</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6hojmlk3pjazrls55xk3evm364">fatcat:6hojmlk3pjazrls55xk3evm364</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220517170129/https://mdpi-res.com/d_attachment/applsci/applsci-12-01686/article_deploy/applsci-12-01686-v2.pdf?version=1644368817" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/e5/7f/e57f41dbfaeb9135d2f82bcb8aacc2cc183a70da.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/app12031686"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> mdpi.com </button> </a>

FeatherWave: An Efficient High-Fidelity Neural Vocoder with Multi-Band Linear Prediction

Qiao Tian, Zewang Zhang, Heng Lu, Ling-Hui Chen, Shan Liu
<span title="2020-10-25">2020</span> <i title="ISCA"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/trpytsxgozamtbp7emuvz2ypra" style="color: black;">Interspeech 2020</a> </i> &nbsp;
Therefore, it can significantly improve the efficiency of speech synthesis. The proposed model with 4 sub-bands needs less than 1.6 GFLOPS for speech generation.  ...  The LPCNet, a recently proposed neural vocoder which utilized the linear predictive characteristic of speech signal in the WaveRNN architecture, can generate high quality speech with a speed faster than  ...  These two members in Intel not only provided the guidance on how to get good performance on the Intel(R) Xeon(R) Scalable Processors, but also helped to optimize/validate our algorithm with Intel(R) Deep  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2020-1156">doi:10.21437/interspeech.2020-1156</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/interspeech/TianZLCL20.html">dblp:conf/interspeech/TianZLCL20</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/jymikrkdgfdadicng7trlrly7i">fatcat:jymikrkdgfdadicng7trlrly7i</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201210054305/https://www.isca-speech.org/archive/Interspeech_2020/pdfs/1156.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/a6/eb/a6ebd47adc85373baed116072d28c6425f9bf7a0.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2020-1156"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures [article]

Dengfeng Ke and Yuxing Lu and Xudong Liu and Yanyan Xu and Jing Sun and Cheng-Hao Cai
<span title="2021-08-06">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this work, in order to explore how to improve the quality and efficiency of singing voice synthesis, in this work, we use encoder-decoder neural models and a number of vocoders to achieve singing voice  ...  With the rapid development of neural network architectures and speech processing models, singing voice synthesis with neural networks is becoming the cutting-edge technique of digital music production.  ...  Such improvement can accelerate the convergence speed of the model and enhance the stability.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2108.03008v1">arXiv:2108.03008v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/p73gewntybebbi2pggotyyqtwy">fatcat:p73gewntybebbi2pggotyyqtwy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210811165133/https://arxiv.org/pdf/2108.03008v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/b8/b1/b8b11206db74716b30c9c0e9401c2832c20330b3.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2108.03008v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search [article]

Jaehyeon Kim, Sungwon Kim, Jungil Kong, Sungroh Yoon
<span title="2020-10-23">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Glow-TTS obtains an order-of-magnitude speed-up over the autoregressive model, Tacotron 2, at synthesis with comparable speech quality.  ...  [10] introduced these transformations for speech synthesis to overcome the slow sampling speed of an autoregressive vocoder, WaveNet [29] .  ...  We also measure the total inference time for synthesizing 1-minute speech in an end-to-end setting with Glow-TTS and WaveGlow.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.11129v2">arXiv:2005.11129v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/efmkcdp6j5hwjbnd22nqir6pce">fatcat:efmkcdp6j5hwjbnd22nqir6pce</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201118235645/https://arxiv.org/pdf/2005.11129v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/23/e2/23e2a80cb33a237b710ff4a71059984211eadadb.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.11129v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

A Survey on Neural Speech Synthesis [article]

Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu
<span title="2021-07-23">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
As the development of deep learning and artificial intelligence, neural network-based TTS has significantly improved the quality of synthesized speech in recent years.  ...  We focus on the key components in neural TTS, including text analysis, acoustic models and vocoders, and several advanced topics, including fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive  ...  Some vocoders based on bipartite transforms include WaveGlow [279] and FloWaveNet [163] , which achieve high voice quality and fast inference speed.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.15561v3">arXiv:2106.15561v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/pbrbs6xay5e4fhf4ewlp7qvybi">fatcat:pbrbs6xay5e4fhf4ewlp7qvybi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210727181001/https://arxiv.org/pdf/2106.15561v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/43/84/4384ff4ac7459d3045ff660b1772c975512701d9.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.15561v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading [article]

Leyuan Qu, Cornelius Weber, Stefan Wermter
<span title="2021-12-09">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We propose LipSound2 which consists of an encoder-decoder architecture and location-aware attention mechanism to map face image sequences to mel-scale spectrograms directly without requiring any human  ...  To verify the generalizability of the proposed method, we then fine-tune the pre-trained model on domain-specific datasets (GRID, TCD-TIMIT) for English speech reconstruction and achieve a significant improvement  ...  We use the cosine learning rate for WaveGlow training, in this paper, to transform mel- decay strategy with an initial value of 0.001.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2112.04748v1">arXiv:2112.04748v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/nkecrtplr5h3laiwpsd6gxjnqu">fatcat:nkecrtplr5h3laiwpsd6gxjnqu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20211211045255/https://arxiv.org/pdf/2112.04748v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/20/7f/207fc384d10d56023607e1a224a66f7c7424f4ca.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2112.04748v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Expediting TTS Synthesis with Adversarial Vocoding

Paarth Neekhara, Chris Donahue, Miller Puckette, Shlomo Dubnov, Julian McAuley
<span title="2019-09-15">2019</span> <i title="ISCA"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/trpytsxgozamtbp7emuvz2ypra" style="color: black;">Interspeech 2019</a> </i> &nbsp;
baseline and faster speeds than state-ofthe-art vocoding methods. • We show that our method can effectively vocode highlycompressed (13:1) audio feature representations. • We show that our method improves  ...  Our method is more than 1000× faster than the autoregressive WaveNet vocoder and 2.5× faster than WaveGlow vocoder.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2019-3099">doi:10.21437/interspeech.2019-3099</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/interspeech/NeekharaDPDM19.html">dblp:conf/interspeech/NeekharaDPDM19</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/2cse7dwvbrdhbg37qyzeboqejq">fatcat:2cse7dwvbrdhbg37qyzeboqejq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200509112636/https://escholarship.org/content/qt1rq6x0dq/qt1rq6x0dq.pdf?t=q56i83" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/6b/ef/6bef61ca55aa7d752e1b4dee56202bae36c78a5e.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2019-3099"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

GAN Vocoder: Multi-Resolution Discriminator Is All You Need [article]

Jaeseong You, Dalhyun Kim, Gyuhyeon Nam, Geumbyeol Hwang, Gyeongsu Chae
<span title="2021-08-23">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Several of the latest GAN-based vocoders show remarkable achievements, outperforming autoregressive and flow-based competitors in both qualitative and quantitative measures while synthesizing orders of  ...  We experimentally test the hypothesis by evaluating six different generators paired with one shared multi-resolution discriminating framework.  ...  MelGAN was the first demonstration that a GAN vocoder can achieve a performance that comes close to that of WaveGlow [5] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.05236v2">arXiv:2103.05236v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/b7yeyqxdfzfu7fqnw2r7h7s47y">fatcat:b7yeyqxdfzfu7fqnw2r7h7s47y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210903141435/https://arxiv.org/pdf/2103.05236v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/69/a2/69a23f8030363f62fe7064497287cefae1688207.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.05236v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN [article]

Congyi Wang, Yu Chen, Bin Wang, Yi Shi
<span title="2021-03-29">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Moreover, PRLSGAN is a general-purposed framework that can be combined with any GAN-based neural vocoder to enhance its generation quality.  ...  to fool the discriminator, leading to improved generation quality.  ...  Compared with conventional vocoders( [2] , [3] ), neural vocoders can significantly enhance the speech synthesis quality of the current text-to-speech (TTS) system.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.14245v2">arXiv:2103.14245v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/zzzjs7xdsffjna5qrv5xytvl7y">fatcat:zzzjs7xdsffjna5qrv5xytvl7y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210409120318/https://arxiv.org/pdf/2103.14245v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/9f/36/9f363b2fb4160f92efe1265cad603d66b9427051.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.14245v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 36 results