Filters








288 Hits in 2.8 sec

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le
2019 arXiv   pre-print
We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients).  ...  We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks.  ...  For example, in [9, 10] , artificial data was augmented for low resource speech recognition tasks. Vocal Tract Length Normalization has been adapted for data augmentation in [11] .  ... 
arXiv:1904.08779v2 fatcat:ay2xqzwsgbgtbi2hydlxfk7gza

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le
2019 Interspeech 2019  
We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients).  ...  We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks.  ...  Acknowledgements: We would like to thank Yuan Cao, Ciprian Chelba, Kazuki Irie, Ye Jia, Anjuli Kannan, Patrick Nguyen, Vijay Peddinti, Rohit Prabhavalkar, Yonghui Wu and Shuyuan Zhang for useful discussions  ... 
doi:10.21437/interspeech.2019-2680 dblp:conf/interspeech/ParkCZCZCL19 fatcat:7ypa4xztjvbelbjoqgyf424qom

SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification [article]

Helin Wang, Yuexian Zou, Wenwu Wang
2021 arXiv   pre-print
In this paper, we present SpecAugment++, a novel data augmentation method for deep neural networks based acoustic scene classification (ASC).  ...  Different from other popular data augmentation methods such as SpecAugment and mixup that only work on the input space, SpecAugment++ is applied to both the input space and the hidden space of the deep  ...  proposed method is simple and computationally cheap to apply, which has shown better performance than the state-of-the-art methods.  ... 
arXiv:2103.16858v3 fatcat:3sskspi4bjayhdktpn5zs3n2fq

On Using SpecAugment for End-to-End Speech Translation [article]

Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney
2019 arXiv   pre-print
This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation.  ...  We also examine the effectiveness of the method in a variety of data scenarios and show that the method also leads to significant improvements in various data conditions irrespective of the amount of training  ...  [21] 17.0 this work direct+pretraining+SpecAugment 2 17.0 Conclusion We have studied SpecAugment, a simple and low-cost data augmentation for end-to-end direct speech translation.  ... 
arXiv:1911.08876v1 fatcat:omlozz7m3zaifhjkrpd3kznxaa

On Using SpecAugment for End-to-End Speech Translation

Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney
2019 Zenodo  
This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation.  ...  We also examine the effectiveness of the method in a variety of data scenarios and show that the method also leads to significant improvements in various data conditions irrespective of the amount of training  ...  [21] 17.0 this work direct+pretraining+SpecAugment 2 17.0 Conclusion We have studied SpecAugment, a simple and low-cost data augmentation for end-to-end direct speech translation.  ... 
doi:10.5281/zenodo.3525009 fatcat:xwpicnx3cjdwhmwpi7hcmnr26i

The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment [article]

Wei Zhou, Wilfried Michel, Kazuki Irie, Markus Kitza, Ralf Schlüter, Hermann Ney
2020 arXiv   pre-print
Data augmentation using SpecAugment is successfully applied to improve performance on top of our best SAT model using i-vectors.  ...  We present a complete training pipeline to build a state-of-the-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus.  ...  We thank Albert Zeyer, Christoph Lüscher, Pavel Golik, Peter Vieting and Tobias Menne for useful discussions.  ... 
arXiv:2004.00960v1 fatcat:bkec6ohawzeyjgew2fhjrzkame

MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition [article]

Linghui Meng, Jin Xu, Xu Tan, Jindong Wang, Tao Qin, Bo Xu
2021 arXiv   pre-print
In this paper, we propose MixSpeech, a simple yet effective data augmentation method based on mixup for automatic speech recognition (ASR).  ...  Experimental results show that MixSpeech achieves better accuracy than the baseline models without data augmentation, and outperforms a strong data augmentation method SpecAugment on these recognition  ...  In this paper, we propose MixSpeech, a simple yet effective data augmentation method for automatic speech recognition.  ... 
arXiv:2102.12664v1 fatcat:qgbgscsg6rhnhovxzcaouy7fcu

SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation [article]

Arya D. McCarthy and Liezl Puzon and Juan Pino
2020 arXiv   pre-print
We propose autoencoding speaker conversion for training data augmentation in automatic speech translation.  ...  Our method compares favorably to SpecAugment on English→French and English→Romanian automatic speech translation (AST) tasks as well as on a low-resource English automatic speech recognition (ASR) task  ...  They can leverage large-scale automatic speech recognition (ASR) and machine translation (MT) training datasets.  ... 
arXiv:2002.12231v1 fatcat:epuqipqds5ckloljxaekxmmixu

Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling [article]

Haiwei Wu, Lin Zhang, Lin Yang, Xuyang Wang, Junjie Wang, Dong Zhang, Ming Li
2020 arXiv   pre-print
Several data augmentation schemes are used to increase the quantity of training data and improve our models' robustness, including speed perturbation, SpecAugment, and random erasing.  ...  For the speech breath monitoring task, we investigate different bottleneck features based on the Bi-LSTM structure.  ...  We want to thank Hamilton, Antonia and Macintyre, Alexis from University College London to share the speech breathing dataset with us for this paper.  ... 
arXiv:2008.05175v2 fatcat:5icoyftgkza6xik4bsk3pj6yra

An Effective Learning Method for Automatic Speech Recognition in Korean CI Patients' Speech

Jiho Jeong, S. I. M. M. Raton Mondol, Yeon Wook Kim, Sangmin Lee
2021 Electronics  
The automatic speech recognition (ASR) model usually requires a large amount of training data to provide better results compared with the ASR models trained with a small amount of training data.  ...  The proposed method achieved a CER of 36.03% on the CI patient's speech test dataset using only 2 h and 30 min of training data, which is a 62% improvement over the basic method.  ...  Second, we used a data augmentation technique and selected the augmentation method [10, 11] used for standard speech.  ... 
doi:10.3390/electronics10070807 fatcat:awvng4rrefchjp5vlz2zq4y7am

On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR [article]

Tsz Kin Lam, Mayumi Ohta, Shigehiko Schamoni, Stefan Riezler
2021 arXiv   pre-print
We propose an on-the-fly data augmentation method for automatic speech recognition (ASR) that uses alignment information to generate effective training samples.  ...  Our method, called Aligned Data Augmentation (ADA) for ASR, replaces transcribed tokens and the speech representations in an aligned manner to generate previously unseen training pairs.  ...  We'd like to thank the reviewers for their helpful comments, which we mainly address on our github repository page due to length constraints.  ... 
arXiv:2104.01393v2 fatcat:yzx6szt3dnabtif2sfa5b7blvu

Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems [article]

Nick Rossenbach, Albert Zeyer, Ralf Schlüter, Hermann Ney
2020 arXiv   pre-print
We compare our method with language model integration of the same text data and with simple data augmentation methods like SpecAugment and show that performance improvements are mostly independent.  ...  We extend state-of-the-art attention-based automatic speech recognition (ASR) systems with synthetic audio generated by a TTS system trained only on the ASR corpora itself.  ...  For the following experiments we only use SpecAugment as data-augmentation method.  ... 
arXiv:1912.09257v2 fatcat:7vjvlzmmw5ezvdfc7glmgdgdiy

SapAugment: Learning A Sample Adaptive Policy for Data Augmentation [article]

Ting-Yao Hu, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Stefan Braun, Kyuyeon Hwang, Ozlem Kalinli, Oncel Tuzel
2021 arXiv   pre-print
We apply our method on an automatic speech recognition (ASR) task, and combine existing and novel augmentations using the proposed framework.  ...  Data augmentation methods usually apply the same augmentation (or a mix of them) to all the training samples.  ...  The state-of-the-art augmentation method for the ASR task is a fixed augmentation policy called SpecAugment [10] , which perturbs the data in feature (log mel spectrogram) domain.  ... 
arXiv:2011.01156v2 fatcat:2qsybnrrtfetxnmnp2ew3d3rnm

A comparison of streaming models and data augmentation methods for robust speech recognition [article]

Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim
2021 arXiv   pre-print
All these advantages make RNN-T models a better choice for streaming on-device speech recognition compared to MoChA models.  ...  We explore three recently proposed data augmentation techniques, namely, multi-conditioned training using an acoustic simulator, Vocal Tract Length Perturbation (VTLP) for speaker variability, and SpecAugment  ...  For a relevant research, [7] uses TTS approach and [20] selects SpecAugment as an augmentation method.  ... 
arXiv:2111.10043v1 fatcat:degmunndkvfkngswzagdwp4rsm

Data Augmentation for End-to-End Speech Translation: FBK@IWSLT '19

Mattia A. Di Gangi, Matteo Negri, Viet Nhat Nguyen, Amirhossein Tebbifakhr, Marco Turchi
2019 Zenodo  
On the training side, we focused on data augmentation techniques recently proposed for ST and automatic speech recognition (ASR).  ...  Our participation had a twofold goal: i) testing our latest models, and ii) eval- uating the contribution to model training of different data augmentation techniques.  ...  Acknowledgements This work is part of a project financially supported by an Amazon AWS ML Grant. We thank Mauro Cettolo for the useful technical conversations. References  ... 
doi:10.5281/zenodo.3525492 fatcat:yvmfqs3gqrainc2gno4f5eoyhe
« Previous Showing results 1 — 15 out of 288 results