2 Hits in 1.8 sec

SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network [article]

William Chan, Daniel Park, Chris Lee, Yu Zhang, Quoc Le, Mohammad Norouzi
2021 arXiv   pre-print
We present SpeechStew, a speech recognition model that is trained on a combination of various publicly available speech recognition datasets: AMI, Broadcast News, Common Voice, LibriSpeech, Switchboard  ...  SpeechStew simply mixes all of these datasets together, without any special re-weighting or re-balancing of the datasets.  ...  Acknowledgements We give thanks to Chung-Cheng Chiu, David Fleet, Naoyuki Kanda, Bo Li, and Samy Bengio for their discussions and review of the manuscript. References  ... 
arXiv:2104.02133v3 fatcat:5g7qwzc3ibcinmwbgjq6run2mu

SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training [article]

Ankur Bapna, Yu-an Chung, Nan Wu, Anmol Gulati, Ye Jia, Jonathan H. Clark, Melvin Johnson, Jason Riesa, Alexis Conneau, Yu Zhang
2021 arXiv   pre-print
recognition data.  ...  pre-trained models, while retaining close to SotA performance on LibriSpeech and SpeechStew ASR tasks.  ...  RELATED WORK Self-supervised learning of language representations using neural networks has a long history.  ... 
arXiv:2110.10329v1 fatcat:eat26t3cwrhudenhehivfljqvm