Filters








464 Hits in 1.0 sec

SpecAugment on Large Scale Datasets [article]

Daniel S. Park, Yu Zhang, Chung-Cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc V. Le, Yonghui Wu
2019 arXiv   pre-print
In this paper, we demonstrate its effectiveness on tasks with large scale datasets by investigating its application to the Google Multidomain Dataset (Narayanan et al., 2018).  ...  We also introduce a modification of SpecAugment that adapts the time mask size and/or multiplicity depending on the length of the utterance, which can potentially benefit large scale tasks.  ...  We have commented on SpecAugment [1] in the introduction. Data augmentation has also been successfully applied to large scale industrial datasets.  ... 
arXiv:1912.05533v1 fatcat:6dgaybq7x5agtls5cndvay4mj4

Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems [article]

Xinwei Li, Yuanyuan Zhang, Xiaodan Zhuang, Daben Liu
2020 arXiv   pre-print
As the benefits of augmentation techniques tend to diminish as training data size increases, the large scale training reported is important in understanding the effectiveness of f-SpecAugment.  ...  We evaluate the proposed f-SpecAugment on 50-layer Self-Normalizing Deep CNN (SNDCNN) acoustic models trained with up to 25000 hours of training data.  ...  Results on large scale datasets To study the effectiveness of f-SpecAugment on large scale datasets, we conducted experiments on the full training sets for Japanese, French, Indian English and Mandarin  ... 
arXiv:2012.04094v1 fatcat:o25dtgjq6bgr3aozlmmdehxb3m

Data augmentation using prosody and false starts to recognize non-native children's speech [article]

Hemant Kathania, Mittul Singh, Tamás Grósz, Mikko Kurimo
2020 arXiv   pre-print
Acoustic models trained on prosody-based augmented data outperform the models using the baseline recipe or the SpecAugment-based augmentation.  ...  This paper describes AaltoASR's speech recognition system for the INTERSPEECH 2020 shared task on Automatic Speech Recognition (ASR) for non-native children's speech.  ...  Training on the combined data leads to large performance improvements in terms of Word-Error-Rate (WER), as noted in Table 4 .  ... 
arXiv:2008.12914v1 fatcat:xusvpwcohjfrjk2g7kk7hyp3ju

On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR [article]

Tsz Kin Lam, Mayumi Ohta, Shigehiko Schamoni, Stefan Riezler
2021 arXiv   pre-print
Our experiments on a Seq-to-Seq architecture show that ADA can be applied on top of SpecAugment, and achieves about 9-23% and 4-15% relative improvements in WER over SpecAugment alone on LibriSpeech 100h  ...  and LibriSpeech 960h test datasets, respectively.  ...  We'd like to thank the reviewers for their helpful comments, which we mainly address on our github repository page due to length constraints.  ... 
arXiv:2104.01393v2 fatcat:yzx6szt3dnabtif2sfa5b7blvu

Spectral Modification Based Data Augmentation For Improving End-to-End ASR For Children's Speech [article]

Vishwanath Pratap Singh, Hardik Sailor, Supratik Bhattacharya, Abhishek Pandey
2022 arXiv   pre-print
on Librispeech 100 hours adult speech dataset.  ...  (ASR) system for children's speech recognition is a challenging task due to inherent differences in acoustic attributes of adult and child speech and scarcity of publicly available children's speech dataset  ...  In all of the experiments, SpecAugment [13] has been applied by default and other mentioned augmentation techniques are applied on top of SpecAugment.  ... 
arXiv:2203.06600v1 fatcat:6lursxhtdrg5vfyx3i4g2ks2fm

Data Augmentation Using Prosody and False Starts to Recognize Non-Native Children's Speech

Hemant Kathania, Mittul Singh, Tamás Grósz, Mikko Kurimo
2020 Interspeech 2020  
Acoustic models trained on prosody-based augmented data outperform the models using the baseline recipe or the SpecAugment-based augmentation.  ...  This paper describes AaltoASR's speech recognition system for the INTERSPEECH 2020 shared task on Automatic Speech Recognition (ASR) for non-native children's speech.  ...  Training on the combined data leads to large performance improvements in terms of Word-Error-Rate (WER), as noted in Table 4 .  ... 
doi:10.21437/interspeech.2020-2199 dblp:conf/interspeech/KathaniaSGK20 fatcat:ub4nmoh77bhazcqgka5cntr2k4

SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation [article]

Arya D. McCarthy and Liezl Puzon and Juan Pino
2020 arXiv   pre-print
Our method compares favorably to SpecAugment on English→French and English→Romanian automatic speech translation (AST) tasks as well as on a low-resource English automatic speech recognition (ASR) task  ...  Finally, we show that we can combine our approach with augmentation by machine-translated transcripts to obtain a competitive end-to-end AST model that outperforms a very strong cascade model on an English  ...  They can leverage large-scale automatic speech recognition (ASR) and machine translation (MT) training datasets.  ... 
arXiv:2002.12231v1 fatcat:epuqipqds5ckloljxaekxmmixu

Transfer Learning and SpecAugment applied to SSVEP Based BCI Classification [article]

Pedro R. A. S. Bassi, Willian Rampazzo, Romis Attux
2020 arXiv   pre-print
The results, when excluding the evaluated user's data from the fine-tuning process, reached 99.3% mean test accuracy and 0.992 mean F1 score on 35 subjects from an open dataset.  ...  We applied a second technique, data augmentation, mostly SpecAugment, generally employed to speech recognition.  ...  Deep neural networks (DNNs) perform very well when trained on a large amount of data [1] , but large SSVEP datasets are not commonly available for open use.  ... 
arXiv:2010.06503v1 fatcat:mwmuu4yahzcjlgq52qj3xv4emy

Improving X-Vector and PLDA for Text-Dependent Speaker Verification

Zhuxin Chen, Yue Lin
2020 Interspeech 2020  
Experimental results on the SDSVC 2020 dataset show that our proposed methods achieve significant performance improvement compared with the x-vector and HMM based i-vector baselines.  ...  Prior studies have found that x-vector leverage large-scale training datasets better than i-vector [2, 3] . In addition, the back-end similarity measurement plays an important role.  ...  We adjust the output dimension of LDA based on the results of the development set. PLDA Adaptation It is well known that the performance of SV system benefits from large-scale in-domain data.  ... 
doi:10.21437/interspeech.2020-1188 dblp:conf/interspeech/ChenL20 fatcat:qacuyqhsszgohkfjoncm263vy4

Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models [article]

Zhiyun Lu, Liangliang Cao, Yu Zhang, Chung-Cheng Chiu, James Fan
2020 arXiv   pre-print
We use well benchmarked IEMOCAP dataset and a new large-scale speech sentiment dataset SWBD-sentiment for evaluation.  ...  Our approach improves the-state-of-the-art accuracy on IEMOCAP from 66.6% to 71.7%, and achieves an accuracy of 70.10% on SWBD-sentiment with more than 49,500 utterances.  ...  Moreover, we create a large-scale speech sentiment dataset SWBD-sentiment to facilitate future research in this field.  ... 
arXiv:1911.09762v2 fatcat:mh57xcoz7bbhxaauuvkhgt3dya

Learning Higher Representations from Pre-Trained Deep Models with Data Augmentation for the COMPARE 2020 Challenge Mask Task

Tomoya Koike, Kun Qian, Björn W. Schuller, Yoshiharu Yamamoto
2020 Interspeech 2020  
Unlike the previous studies mainly based on pre-trained models by image data, we use a pre-trained model based on large scale audio data, i. e., AudioSet.  ...  In addition, the SpecAugment and mixup methods are used to improve the generalisation of the deep models.  ...  proposed the PANNs which are pre-trained on large-scale audio data, i. e., the AudioSet [23] .  ... 
doi:10.21437/interspeech.2020-1552 dblp:conf/interspeech/Koike0SY20 fatcat:ypz6r6mbcvdpfhwhrdkd3m2vpm

Training Keyword Spotting Models on Non-IID Data with Federated Learning

Andrew Hard, Kurt Partridge, Cameron Nguyen, Niranjan Subrahmanya, Aishanee Shah, Pai Zhu, Ignacio Lopez Moreno, Rajiv Mathews
2020 Interspeech 2020  
algorithms and hyperparameter configurations using large-scale federated simulations.  ...  To overcome resource constraints, we replace memoryintensive MTR data augmentation with SpecAugment, which reduces the false reject rate by 56%.  ...  Using simulated federated learning experiments on large-scale datasets consisting of thousands of speakers and millions of utterances, we address the algorithmic challenges associated with training on  ... 
doi:10.21437/interspeech.2020-3023 dblp:conf/interspeech/HardPNSSZLM20 fatcat:5ggpsfoi25fkld7c7wdqkfafde

A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data [article]

Nathan Howard, Alex Park, Turaj Zakizadeh Shabestary, Alexander Gruenstein, Rohit Prabhavalkar
2021 arXiv   pre-print
Second, we demonstrate that augmenting our training dataset of real world examples with a large synthetic dataset improves performance.  ...  Crucially, applying SpecAugment style masks to the reference channel during training aids the model in adapting from synthetic to real domains.  ...  The first two models were trained on the train partition of LibriSpeech, and the last model was trained on a large corpus of far field and near field non-LibriSpeech utterances.  ... 
arXiv:2106.00856v1 fatcat:rkm6mujjqjeuhde5ofv53wbo2m

Training Keyword Spotting Models on Non-IID Data with Federated Learning [article]

Andrew Hard, Kurt Partridge, Cameron Nguyen, Niranjan Subrahmanya, Aishanee Shah, Pai Zhu, Ignacio Lopez Moreno, Rajiv Mathews
2020 arXiv   pre-print
algorithms and hyperparameter configurations using large-scale federated simulations.  ...  To overcome resource constraints, we replace memory intensive MTR data augmentation with SpecAugment, which reduces the false reject rate by 56%.  ...  The authors would like to thank Google Research colleagues for providing the FL framework, Manzil Zaheer for his optimizer expertise, and Daniel Park for SpecAugment discussions. References  ... 
arXiv:2005.10406v2 fatcat:yhfuwayucjandmnawwvtdclwie

NeurST: Neural Speech Translation Toolkit [article]

Chengqi Zhao and Mingxuan Wang and Qianqian Dong and Rong Ye and Lei Li
2021 arXiv   pre-print
The toolkit mainly focuses on end-to-end speech translation, which is easy to use, modify, and extend to advanced speech translation research and products.  ...  In this paper, we will introduce the framework design of NeurST and show experimental results for different benchmark datasets, which can be regarded as reliable baselines for future research.  ...  Hence, we further extended NeurST to large-scale scenarios and experimented on the allowed datasets for IWSLT 2021 evaluation campaign 10 .  ... 
arXiv:2012.10018v3 fatcat:awpgt22zqffnrowfay5dapmw3e
« Previous Showing results 1 — 15 out of 464 results