Filters








588 Hits in 4.1 sec

Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset [article]

Xie Chen, Yu Wu, Zhenghao Wang, Shujie Liu, Jinyu Li
<span title="2021-02-28">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this work, we explored the potential of Transformer Transducer (T-T) models for the fist pass decoding with low latency and fast speed on a large-scale dataset.  ...  Recently, Transformer based end-to-end models have achieved great success in many areas including speech recognition.  ...  CONCLUSION We develop streaming T-T and C-T speech recognition model for real-time speech recognition, in the hope that the powerful Transformer encoder and streaming natural transducer architecture could  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.11395v3">arXiv:2010.11395v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ujmi6hytprgjnengtm7rpleq6y">fatcat:ujmi6hytprgjnengtm7rpleq6y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201024100430/https://arxiv.org/pdf/2010.11395v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ff/4a/ff4abb809b5938bee4a774ffbadbfe13abe914c1.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.11395v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit [article]

Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu
<span title="2022-03-29">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming  ...  production and improves ASR accuracy in both with-LM and without-LM scenarios. (4) We design a unified IO to support large-scale data for effective model training.  ...  To address the aforementioned problems for large-scale production datasets and keep the high efficiency for small datasets at the same time, we design a unified IO system, which provides a unified interface  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2203.15455v1">arXiv:2203.15455v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/qnl7y5zwsrbmlbb6y6oc6bndlq">fatcat:qnl7y5zwsrbmlbb6y6oc6bndlq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220405101301/https://arxiv.org/pdf/2203.15455v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/0b/66/0b66ba2db629e723211a6d7b9266196fa7fccf37.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2203.15455v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Research on Modeling Units of Transformer Transducer for Mandarin Speech Recognition [article]

Li Fu, Xiaoxiao Li, Libo Zi
<span title="2020-04-26">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
To improve the performance of RNN-T for Mandarin speech recognition task, a novel transformer transducer with the combination architecture of self-attention transformer and RNN is proposed.  ...  And then the choice of different modeling units for transformer transducer is explored.  ...  To improve the performance of RNN-T for Mandarin speech recognition, we research on the effect of different modeling units for transformer transducer.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2004.13522v1">arXiv:2004.13522v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/uouecsqz3ze5fbo3ck7yrbfe6u">fatcat:uouecsqz3ze5fbo3ck7yrbfe6u</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200506083740/https://arxiv.org/ftp/arxiv/papers/2004/2004.13522.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2004.13522v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling [article]

Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N. Sainath, Yonghui Wu, Ruoming Pang
<span title="2021-01-27">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We present extensive experiments with two state-of-the-art ASR networks, ContextNet and Conformer, on two datasets, a widely used public dataset LibriSpeech and a large-scale dataset MultiDomain.  ...  Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible, while full-context ASR waits for the completion of a full speech utterance before  ...  A single trained DSNN can transform into multiple networks of different sparsities for adaptive inference in real-time.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.06030v2">arXiv:2010.06030v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/hiwfxvkvafh2tcfwlqcbknpo4e">fatcat:hiwfxvkvafh2tcfwlqcbknpo4e</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210131052345/https://arxiv.org/pdf/2010.06030v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/1f/7a/1f7a551c97b96b45d2eea55b44bf2f894ff39c02.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.06030v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit [article]

Zhuoyuan Yao, Di Wu, Xiong Wang, Binbin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, Xin Lei
<span title="2021-12-29">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Our experiments on the AISHELL-1 dataset using WeNet show that, our model achieves 5.03\% relative character error rate (CER) reduction in non-streaming ASR compared to a standard non-streaming transformer  ...  end-to-end (E2E) speech recognition in a single model.  ...  and radio broadcast to show our model's ability on an industry-scale dataset.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2102.01547v5">arXiv:2102.01547v5</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/pfgprq2khrf6dotsr3rwivg4n4">fatcat:pfgprq2khrf6dotsr3rwivg4n4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210623231950/https://arxiv.org/pdf/2102.01547v3.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/6a/e5/6ae55235852f70d0bfc6b0250c2d0be18112c070.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2102.01547v5" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications [article]

Yongqiang Wang, Yangyang Shi, Frank Zhang, Chunyang Wu, Julian Chan, Ching-Feng Yeh, Alex Xiao
<span title="2020-10-29">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper, we summarize the application of transformer and its streamable variant, Emformer based acoustic model for large scale speech recognition applications.  ...  For medium latency scenarios, comparing with LCBLSTM with similar model size and latency, Emformer gets significant WERR across four languages in video captioning datasets with 2-3 times inference real-time  ...  Conclusions In this work, we compare the LSTM-based acoustic models with transformer-based ones for a range of large scale speech recognition tasks.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.14665v2">arXiv:2010.14665v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/k5umdj3imrfcnbybfb5cucn2ne">fatcat:k5umdj3imrfcnbybfb5cucn2ne</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201103002210/https://arxiv.org/pdf/2010.14665v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/87/3a/873a3606459fa39d32ba70de2356bf2182bea5b3.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.14665v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Scaling Up Online Speech Recognition Using ConvNets

Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
<span title="2020-10-25">2020</span> <i title="ISCA"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/trpytsxgozamtbp7emuvz2ypra" style="color: black;">Interspeech 2020</a> </i> &nbsp;
We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC).  ...  Index Terms: online speech recognition, low latency 1 Training recipe along with inference code is available under https://github.com/facebookresearch/wav2letter  ...  Furthermore, deployment of a real-time speech recognition system at scale poses a number of challenges.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2020-2840">doi:10.21437/interspeech.2020-2840</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/interspeech/PratapXKALHLSC20.html">dblp:conf/interspeech/PratapXKALHLSC20</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/7klms24dczg4ngqscqegmvj7ku">fatcat:7klms24dczg4ngqscqegmvj7ku</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201212200152/https://www.isca-speech.org/archive/Interspeech_2020/pdfs/2840.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/6c/14/6c144e6267f496abd8c2149a4f4c94e197087bbf.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2020-2840"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Transformer-Transducers for Code-Switched Speech Recognition [article]

Siddharth Dalmia, Yuzong Liu, Srikanth Ronanki, Katrin Kirchhoff
<span title="2021-02-15">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper, we present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition.  ...  As automatic speech recognition (ASR) systems are being deployed to the real-world, there is a need for practical systems that can handle multiple languages both within an utterance or across utterances  ...  Transducer Loss Background Transducer models are widely used for online speech recognition for its streaming capabilities and low memory footprint [44, 45, 46] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2011.15023v2">arXiv:2011.15023v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ugrxkmweffejhahdmqtmihe3iu">fatcat:ugrxkmweffejhahdmqtmihe3iu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210217105303/https://arxiv.org/pdf/2011.15023v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/df/c8/dfc81e7a3968a86d5fc897e3a140d59f260feb70.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2011.15023v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Streaming Transformer-Based Acoustic Models Using Self-Attention with Augmented Memory

Chunyang Wu, Yongqiang Wang, Yangyang Shi, Ching-Feng Yeh, Frank Zhang
<span title="2020-10-25">2020</span> <i title="ISCA"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/trpytsxgozamtbp7emuvz2ypra" style="color: black;">Interspeech 2020</a> </i> &nbsp;
Our findings are also confirmed on some large internal datasets.  ...  Transformer-based acoustic modeling has achieved great success for both hybrid and sequence-to-sequence speech recognition.  ...  Conclusions In this work, we proposed the augmented memory transformer for streaming transformer-based models for speech recognition.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2020-2079">doi:10.21437/interspeech.2020-2079</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/interspeech/WuWSYZ20.html">dblp:conf/interspeech/WuWSYZ20</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/p5awv5fk4bbmvcknsendoyo624">fatcat:p5awv5fk4bbmvcknsendoyo624</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201211224404/https://www.isca-speech.org/archive/Interspeech_2020/pdfs/2079.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/17/3f/173f39a4dd45758f134c773f1110a1bfb1a0f997.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2020-2079"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans [article]

Shinji Watanabe, Florian Boyer, Xuankai Chang, Pengcheng Guo, Tomoki Hayashi, Yosuke Higuchi, Takaaki Hori, Wen-Chin Huang, Hirofumi Inaguma, Naoyuki Kamo, Shigeki Karita, Chenda Li (+3 others)
<span title="2020-12-23">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
This project was initiated in December 2017 to mainly deal with end-to-end speech recognition experiments based on sequence-to-sequence modeling.  ...  Also, ESPnet provides reproducible all-in-one recipes for these applications with state-of-the-art performance in various benchmarks by incorporating transformer, advanced data augmentation, and conformer  ...  large scale corpus.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.13006v1">arXiv:2012.13006v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/zvvoohohdzesbh2t4b626xj3dq">fatcat:zvvoohohdzesbh2t4b626xj3dq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201228235203/https://arxiv.org/pdf/2012.13006v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/65/9b/659b476a10b0e676a031b1b17ebfe405c1904227.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.13006v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition

Quan Wang, Ignacio Lopez Moreno, Mert Saglam, Kevin Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein
<span title="2020-10-25">2020</span> <i title="ISCA"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/trpytsxgozamtbp7emuvz2ypra" style="color: black;">Interspeech 2020</a> </i> &nbsp;
We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system.  ...  Besides, this model must be tiny, fast, and perform inference in a streaming fashion, in order to have minimal impact on CPU, memory, battery and latency.  ...  The authors would like to thank Philip Chao, Sinan Akay, John Han, Stephen Wu, Yiteng Huang, Jaclyn Konzelmann and Nino Tasca for the support and helpful discussions.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2020-1193">doi:10.21437/interspeech.2020-1193</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/interspeech/WangLSWCLHLPNG20.html">dblp:conf/interspeech/WangLSWCLHLPNG20</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/7bi4ldkrujg4pekpqllu4x6fpi">fatcat:7bi4ldkrujg4pekpqllu4x6fpi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201211033649/https://www.isca-speech.org/archive/Interspeech_2020/pdfs/1193.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f6/a4/f6a47526bc3b414dd0685a155a9932e041557525.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.21437/interspeech.2020-1193"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition [article]

Quan Wang, Ignacio Lopez Moreno, Mert Saglam, Kevin Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein
<span title="2020-09-09">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system.  ...  Besides, this model must be tiny, fast, and perform inference in a streaming fashion, in order to have minimal impact on CPU, memory, battery and latency.  ...  The authors would like to thank Philip Chao, Sinan Akay, John Han, Stephen Wu, Yiteng Huang, Jaclyn Konzelmann and Nino Tasca for the support and helpful discussions.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2009.04323v1">arXiv:2009.04323v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/prqvmgsek5dopm7vtkby5y2zru">fatcat:prqvmgsek5dopm7vtkby5y2zru</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200928232316/https://arxiv.org/pdf/2009.04323v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2009.04323v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory [article]

Chunyang Wu, Yongqiang Wang, Yangyang Shi, Ching-Feng Yeh, Frank Zhang
<span title="2020-05-16">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Our find-ings are also confirmed on some large internal datasets.  ...  Transformer-based acoustic modeling has achieved great suc-cess for both hybrid and sequence-to-sequence speech recogni-tion.  ...  Conclusions In this work, we proposed the augmented memory transformer for streaming transformer-based models for speech recognition.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.08042v1">arXiv:2005.08042v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/7u4uyw6ywvbf3iq7ygkfnydqou">fatcat:7u4uyw6ywvbf3iq7ygkfnydqou</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200527170405/https://arxiv.org/pdf/2005.08042v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.08042v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Recurrent Neural Network Transducer for Audio-Visual Speech Recognition [article]

Takaki Makino, Brendan Shillingford
<span title="2019-11-08">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture.  ...  To support the development of such a system, we built a large audio-visual (A/V) dataset of segmented utterances extracted from YouTube public videos, leading to 31k hours of audio-visual training content  ...  In 2016, researchers proposed a novel end-to-end trained model for visual speech recognition [19] and approach to construct large AV datasets suitable for AV-ASR [20] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1911.04890v1">arXiv:1911.04890v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/m4fxjpwvsnbadi4uw2qylueazu">fatcat:m4fxjpwvsnbadi4uw2qylueazu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200902060618/https://arxiv.org/pdf/1911.04890v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/a4/f0/a4f0556a6225135c4525c5fda2fcab00c91609a3.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1911.04890v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data

Ayesha Pervaiz, Fawad Hussain, Huma Israr, Muhammad Ali Tahir, Fawad Riasat Raja, Naveed Khan Baloch, Farruh Ishmanov, Yousaf Bin Zikria
<span title="2020-04-19">2020</span> <i title="MDPI AG"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/taedaf6aozg7vitz5dpgkojane" style="color: black;">Sensors</a> </i> &nbsp;
Noise is one of the major challenges in any speech recognition system, as real-time noise is a very versatile and unavoidable factor that affects the performance of speech recognition systems, particularly  ...  We thoroughly analyse the latest trends in speech recognition and evaluate the speech command dataset on different machine learning based and deep learning based techniques.  ...  The book [53] enlisted the application of DNN for large vocabulary speech recognition.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/s20082326">doi:10.3390/s20082326</a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pubmed/32325814">pmid:32325814</a> <a target="_blank" rel="external noopener" href="https://pubmed.ncbi.nlm.nih.gov/PMC7219662/">pmcid:PMC7219662</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ftbpxexwd5fvbpj4s2cr76uybq">fatcat:ftbpxexwd5fvbpj4s2cr76uybq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200505021437/https://res.mdpi.com/d_attachment/sensors/sensors-20-02326/article_deploy/sensors-20-02326.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f5/9d/f59da7ef9843f4335ae9bdd38db42a35abfd5a74.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/s20082326"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> mdpi.com </button> </a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7219662" title="pubmed link"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> pubmed.gov </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 588 results