4 Hits in 2.6 sec

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition [article]

Binbin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di Wu, Zhendong Peng
2022 arXiv   pre-print
In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech,  ...  To the best of our knowledge, WenetSpeech is the current largest open-sourced Mandarin speech corpus with transcriptions, which benefits research on production-level speech recognition.  ...  We thank Tencent Ethereal Audio Lab and Xi'an Future AI Innovation Center for providing hosting service for WenetSpeech.  ... 
arXiv:2110.03370v5 fatcat:wipuf337nbg4hjwrur4njwiw34

TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline [article]

Chengfei Li, Shuhao Deng, Yaoping Wang, Guangjing Wang, Yaguang Gong, Changbin Chen, Jinfeng Bai
2022 arXiv   pre-print
This paper introduces a new corpus of Mandarin-English code-switching speech recognition--TALCS corpus, suitable for training and evaluating code-switching speech recognition systems.  ...  Using TALCS corpus, we conduct ASR experiments in two popular speech recognition toolkits to make a baseline system, including ESPnet and Wenet.  ...  WENETSPEECH [3] also provides a multidomain mandarin corpus for speech recognition, which consists of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours  ... 
arXiv:2206.13135v1 fatcat:rhldqph355a5bagwtfe55bzqw4

Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset [article]

Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan
2022 arXiv   pre-print
The MagicData-RAMC corpus contains 180 hours of conversational speech data recorded from native speakers of Mandarin Chinese over mobile phones with a sampling rate of 16 kHz.  ...  As a Mandarin speech dataset designed for dialog scenarios with high quality and rich annotations, MagicData-RAMC enriches the data diversity in the Mandarin speech community and allows extensive research  ...  WenetSpeech [12] is a recently released Mandarin corpus containing more than 10000 hours of speech collected from YouTube and podcasts, adopted with optical character recognition (OCR) and automatic  ... 
arXiv:2203.16844v1 fatcat:dnj2fsqxn5dj7brc5ephzktkty

The MSXF TTS System for ICASSP 2022 ADD Challenge [article]

Chunyong Yang, Pengfei Liu, Yanli Chen, Hongbin Wang, Min Liu
2022 arXiv   pre-print
We use an end to end text to speech system, and add a constraint loss to the system when training stage. The end to end TTS system is VITS, and the pre-training self-supervised model is wav2vec 2.0.  ...  This paper presents our MSXF TTS system for Task 3.1 of the Audio Deep Synthesis Detection (ADD) Challenge 2022.  ...  The wav2vec 2.0 are trained using the open source data WenetSpeech [14] , a 10000+ hours multi-domain mandarin corpus.  ... 
arXiv:2201.11400v1 fatcat:luagwlz4fretfg6dfngmfx3ree