Filters








750 Hits in 3.8 sec

Pushing the Limits of Non-Autoregressive Speech Recognition [article]

Edwin G. Ng, Chung-Cheng Chiu, Yu Zhang, William Chan
<span title="2021-06-16">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal.  ...  We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition.  ...  Conclusion We combine recent advancements in end-to-end speech recognition, and push the limits of non-autoregressive speech recognition.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2104.03416v3">arXiv:2104.03416v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/muqaw7ua5bfdncgccbwjfzunda">fatcat:muqaw7ua5bfdncgccbwjfzunda</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210620021122/https://arxiv.org/pdf/2104.03416v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/9a/66/9a66c42a789a4207f290c5abbf85dad0663e8868.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2104.03416v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition [article]

Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-yi Lee, Shinji Watanabe
<span title="2021-10-09">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
state-of-the-art (SOTA) recognition performance.  ...  In this paper, we focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models.  ...  Specifically, it used the Bridges system [50] , which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC).  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2110.04590v1">arXiv:2110.04590v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/p4peb5urpzaxja62cgalfjnyuy">fatcat:p4peb5urpzaxja62cgalfjnyuy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20211013102948/https://arxiv.org/pdf/2110.04590v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/d9/40/d940e0192a6cc1ddd6288239b77b06e50f042114.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2110.04590v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA

Shehzeen Hussain, Mojan Javaheripi, Paarth Neekhara, Ryan Kastner, Farinaz Koushanfar
<span title="">2019</span> <i title="IEEE"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ffbycmiwjfgqbewnyare6s37ru" style="color: black;">2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)</a> </i> &nbsp;
WaveNet is a deep autoregressive CNN composed of several stacked layers of dilated convolution that is used for sequence generation.  ...  While WaveNet produces state-of-the art audio generation results, the naive inference implementation is quite slow; it takes a few minutes to generate just one second of audio on a high-end GPU.  ...  This has led to state-of-the-art performance in text-to-speech synthesis [2] , [7] , [17] , [18] , speech recognition [19] , and other audio generation settings [1] , [3] , [4] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/iccad45719.2019.8942122">doi:10.1109/iccad45719.2019.8942122</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/iccad/HussainJNKK19.html">dblp:conf/iccad/HussainJNKK19</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/s6jpod255jdjjef73dqjoe6vf4">fatcat:s6jpod255jdjjef73dqjoe6vf4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200321065946/https://arxiv.org/pdf/2002.04971v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/iccad45719.2019.8942122"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks [article]

Sercan O. Arik, Heewoo Jun, Gregory Diamos
<span title="2018-08-20">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
For training of MCNN, we use a large-scale speech recognition dataset and losses defined on waveforms that are related to perceptual audio quality.  ...  We demonstrate that MCNN constitutes a very promising approach for high-quality speech synthesis, without any iterative algorithms or autoregression in computations.  ...  It is originally constructed for automatic speech recognition and its audio quality is lower compared to speech synthesis datasets.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1808.06719v1">arXiv:1808.06719v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/cugkvqp55fgbzab3gjxdufp27u">fatcat:cugkvqp55fgbzab3gjxdufp27u</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200828205301/https://arxiv.org/pdf/1808.06719v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/13/40/1340a8886f736c90a1d7c910690e679e81f742e4.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1808.06719v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Recent Advances in End-to-End Automatic Speech Recognition [article]

Jinyu Li
<span title="2022-02-02">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR).  ...  While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models are still used in a large proportion of commercial ASR systems at the current time.  ...  A) Non-Autoregressive Models While most E2E models use autoregressive (AR) modeling to predict target tokens in a left-to-right manner as Eqs. ( 4 ), there is a recent trend of using non-autoregressive  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2111.01690v2">arXiv:2111.01690v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6pktwep34jdvjklw4gkri4yn4y">fatcat:6pktwep34jdvjklw4gkri4yn4y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220205003822/https://arxiv.org/pdf/2111.01690v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ba/ed/baed0ce3a22bec3823589589d56312b8649d7414.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2111.01690v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

A Survey on Neural Speech Synthesis [article]

Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu
<span title="2021-07-23">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
As the development of deep learning and artificial intelligence, neural network-based TTS has significantly improved the quality of synthesized speech in recent years.  ...  applications in the industry.  ...  Figure 5 : 1) Autoregressive or non-autoregressive.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.15561v3">arXiv:2106.15561v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/pbrbs6xay5e4fhf4ewlp7qvybi">fatcat:pbrbs6xay5e4fhf4ewlp7qvybi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210727181001/https://arxiv.org/pdf/2106.15561v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/43/84/4384ff4ac7459d3045ff660b1772c975512701d9.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.15561v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Back from the future: bidirectional CTC decoding using future information in speech recognition [article]

Namkyu Jung, Geonmin Kim, Han-Gyu Kim
<span title="2021-10-07">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
The proposed method based on bi-directional beam search takes advantage of the CTC greedy decoding output to represent the noisy future information.  ...  Experiments on the Librispeechdataset demonstrate the superiority of our proposed method compared to baselines using unidirectional decoding.  ...  INTRODUCTION In recent years, the performance of automatic speech recognition (ASR) systems have seen dramatic improvements due to the application of deep learning and the use of large-scale datasets  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2110.03326v1">arXiv:2110.03326v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/fc7lcwzhanf7bo62s7cwpsam5y">fatcat:fc7lcwzhanf7bo62s7cwpsam5y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20211009203338/https://arxiv.org/pdf/2110.03326v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/40/66/40660cce8db737e5d2a02953054185ff47725f3c.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2110.03326v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

The USYD-JD Speech Translation System for IWSLT 2021 [article]

Liang Ding, Di Wu, Dacheng Tao
<span title="2021-07-24">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
This paper describes the University of Sydney JD's joint submission of the IWSLT 2021 low resource speech translation task.  ...  For model structure, we tried auto-regressive and non-autoregressive models, respectively.  ...  The ST task contains two major components, Automatic Speech Recognition (ASR, Jelinek 1997) and Machine Translation (MT, Koehn 2009) .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2107.11572v1">arXiv:2107.11572v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/jf5y5jghgjeprbvh23mur2xism">fatcat:jf5y5jghgjeprbvh23mur2xism</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210730164216/https://arxiv.org/pdf/2107.11572v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/67/73/6773573fe5e28b6d439c1cdf88d33166c00dc702.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2107.11572v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

MP3 Compression To Diminish Adversarial Noise in End-to-End Speech Recognition [article]

Iustina Andronic and Ludwig Kürzinger and Edgar Ricardo Chavez Rosas and Gerhard Rigoll and Bernhard U. Seeber
<span title="2020-07-25">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Audio Adversarial Examples (AAE) represent specially created inputs meant to trick Automatic Speech Recognition (ASR) systems into misclassification.  ...  Our method is then validated by two objective indicators: (1) Character Error Rates (CER) that measure the speech decoding performance of four ASR models trained on uncompressed, as well as MP3-compressed  ...  The speech recognition experiments are performed with the hybrid CTC-attention ASR system called ESPnet [25, 24] , which combines the two main techniques for end-to-end speech recognition.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2007.12892v1">arXiv:2007.12892v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/r3igokgsivhqvoecbiswxklphe">fatcat:r3igokgsivhqvoecbiswxklphe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200806171119/https://arxiv.org/pdf/2007.12892v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2007.12892v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Deliberation Model for On-Device Spoken Language Understanding [article]

Duc Le, Akshat Shrivastava, Paden Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer
<span title="2022-04-04">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU), where a streaming automatic speech recognition (ASR) model produces the first-pass hypothesis and  ...  pipeline NLU baselines by 0.82% to 1.34% across various operating points on the spoken version of the TOPv2 dataset.  ...  This setup demonstrated superior results over non-autoregressive and autoregressive alternatives at smaller sizes on TOPv2 [23] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2204.01893v1">arXiv:2204.01893v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/gwtu4e7to5h7blyupgxzlalvlq">fatcat:gwtu4e7to5h7blyupgxzlalvlq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220407033335/https://arxiv.org/pdf/2204.01893v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/33/36/333676f0cab43a5f8d420635296e88682fbfe27a.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2204.01893v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Emotion Recognition from Speech Signals Using DCNN with Hybrid GA-GWO Algorithm

<span title="2019-10-31">2019</span> <i title="Resbee Publisher"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/gamt26lzyrdpnb7fwle2gp5slu" style="color: black;">Multimedia Research</a> </i> &nbsp;
This paper develops the emotions recognition from the speech signal in an accurate way, with the knowledge of numerous examined models.  ...  In recent days, from the speech signal the recognition of emotion is considered as an extensive advanced investigation subject because the speech signal is considered as the rapid and natural method to  ...  At last, for the different methods, the distinct estimators of unvoiced speech, voiced speech, as well as speech non-presence was derived.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.46253/j.mr.v2i4.a2">doi:10.46253/j.mr.v2i4.a2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/px7hmmemofg5jdtcytfk3at3qy">fatcat:px7hmmemofg5jdtcytfk3at3qy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201106000046/https://publisher.resbee.org/mr/archive/v2i4/a2/p2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/95/e7/95e748c2005a1adacee9b4f620eebf7fa8bd5a0c.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.46253/j.mr.v2i4.a2"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-Spectrogram

Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
<span title="2019-09-15">2019</span> <i title="ISCA"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/trpytsxgozamtbp7emuvz2ypra" style="color: black;">Interspeech 2019</a> </i> &nbsp;
More recently, non-linear neural network autoregressive models have become popular for speech generation after the success of WaveNet (van den Oord et al., 2016) .  ...  speech-recognition toolkit (Young et al., 2002) .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2019-2008">doi:10.21437/interspeech.2019-2008</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/interspeech/JuvelaBYA19.html">dblp:conf/interspeech/JuvelaBYA19</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/bd6cc74arvb3joauc3wbeqnkba">fatcat:bd6cc74arvb3joauc3wbeqnkba</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201105114300/https://aaltodoc.aalto.fi/bitstream/handle/123456789/44214/isbn9789526039107.pdf;jsessionid=164992DDB8A349A326F64F065039E2FD?sequence=4" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/a1/45/a145cd45db6e1877945cfd0477889ca3f348da56.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2019-2008"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing [article]

Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou (+7 others)
<span title="2022-01-24">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.  ...  By this means, WavLM does not only keep the speech content modeling capability by the masked speech prediction, but also improves the potential to non-ASR tasks by the speech denoising.  ...  Recent works propose to predict future frames from the history with an autoregressive model [24] - [27] , or recover the masked frames from the corrupted speech with a non-autoregressive model [28]  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2110.13900v4">arXiv:2110.13900v4</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/eg4kyupazzcwfevpkno5eiqbeu">fatcat:eg4kyupazzcwfevpkno5eiqbeu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220207202728/https://arxiv.org/pdf/2110.13900v4.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/2e/31/2e3196e4a71338816b5c405a38d137faf18209ef.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2110.13900v4" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Multi-head Monotonic Chunkwise Attention For Online Speech Recognition [article]

Baiji Liu and Songjun Cao and Sining Sun and Weibin Zhang and Long Ma
<span title="2020-05-01">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
The attention mechanism of the Listen, Attend and Spell (LAS) model requires the whole input sequence to calculate the attention context and thus is not suitable for online speech recognition.  ...  On another 18000 hours in-car speech data set, MTH-MoChA obtains 7.28% CER, which is significantly better than a state-of-the-art hybrid system.  ...  The effectiveness of the above structure and training strategies is demonstrated on both an open-source corpus with limited training data and an in-car dataset with 18000-hour of transcribed speech.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.00205v1">arXiv:2005.00205v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/qy2et47ti5ezvaxjefzmydkodq">fatcat:qy2et47ti5ezvaxjefzmydkodq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200826013105/https://arxiv.org/pdf/2005.00205v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/d9/ab/d9abe337f9393d5059b4f98192c9e0be94db48e8.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.00205v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Large scale weakly and semi-supervised learning for low-resource video ASR [article]

Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
<span title="2020-08-07">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Many semi- and weakly-supervised approaches have been investigated for overcoming the labeling cost of building high quality speech recognition systems.  ...  We investigate distillation methods at the frame level and the sequence level for hybrid, encoder-only CTC-based, and encoder-decoder speech recognition systems on Dutch and Romanian languages using 27,000  ...  Sequence-level distillation is the first form of self-labelling applied for speech recognition [20, 9, 27, 5] . It's also commonly used for non-autoregressive machine translation systems [28] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.07850v2">arXiv:2005.07850v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/l6zcwjqqijfdzh53qtm2goiv5y">fatcat:l6zcwjqqijfdzh53qtm2goiv5y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200817224314/https://arxiv.org/pdf/2005.07850v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.07850v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 750 results