Filters








1,770 Hits in 4.4 sec

Common Voice: A Massively-Multilingual Speech Corpus [article]

Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, Gregor Weber
2020 arXiv   pre-print
The Common Voice corpus is a massively-multilingual collection of transcribed speech intended for speech technology research and development.  ...  As an example use case for Common Voice, we present speech recognition experiments using Mozilla's DeepSpeech Speech-to-Text toolkit.  ...  Acknowledgments Common Voice is a living project, and would not be possible without the thousands of hours given by volunteers.  ... 
arXiv:1912.06670v2 fatcat:cjkiojgww5fwfgvup422qga2xe

CoVoST 2 and Massively Multilingual Speech-to-Text Translation [article]

Changhan Wang, Anne Wu, Juan Pino
2020 arXiv   pre-print
With the aim to foster research in massive multilingual speech translation and speech translation for low resource language pairs, we release CoVoST 2, a large-scale multilingual speech translation corpus  ...  We also provide extensive speech recognition, bilingual and multilingual machine translation and speech translation baselines with open-source implementation.  ...  CoVoST is a multilingual and diversified ST corpus from 11 languages into English, based on the Common Voice project (Ardila et al., 2020) .  ... 
arXiv:2007.10310v3 fatcat:a6uajfmqenax5eqnpgvug4viou

Towards cross-lingual voice cloning in higher education

Alejandro Pérez, Gonçal Garcés Díaz-Munío, Adrià Giménez, Joan Albert Silvestre-Cerdà, Albert Sanchis, Jorge Civera, Manuel Jiménez, Carlos Turró, Alfons Juan
2021 Engineering applications of artificial intelligence  
This includes collecting 59 h of clean speech data from UPV's academic staff, and extending our production pipeline of subtitles with a state-of-the-art multilingual and multi-speaker text-to-speech system  ...  Here, a detailed account is given on how this work has been extended to also allow for massive machine dubbing of MediaUPV.  ...  A common challenge in both online and blended learning is how to produce multilingual video subtitles of publishable quality at scale and low cost.  ... 
doi:10.1016/j.engappai.2021.104413 fatcat:iaiphum4prfyjid4tcok24gsmu

AdaVocoder: Adaptive Vocoder for Custom Voice [article]

Xin Yuan, Yongbing Feng, Mingming Ye, Cheng Tuo, Minghang Zhang
2022 arXiv   pre-print
Custom voice is to construct a personal speech synthesis system by adapting the source speech synthesis model to the target model through the target few recordings.  ...  The solution to constructing a custom voice is to combine an adaptive acoustic model with a robust vocoder.  ...  The CSMSC is a single-speaker Chinese speech corpus, which contains 10,000 recordings totaling about 12 hours. We randomly assign 200 items as the test set and the rest as the training set.  ... 
arXiv:2203.09825v1 fatcat:vm3ftvintbewlgzlurato232cy

Predicting voice alternation across academic Englishes

Marianne Hundt, Melanie Röthlisberger, Elena Seoane
2018 Corpus Linguistics and Linguistic Theory  
For this purpose, we automatically retrieve central be-passives and active transitives from syntactically annotated International Corpus of English corpora and code for factors that are likely to play  ...  Academic writing in the second half of the twentieth century witnesses a notable decrease in be-passives in British and American English (AmE).  ...  Acknowledgements: This study builds on a previous investigation of voice alternation in ICE corpora.  ... 
doi:10.1515/cllt-2017-0050 fatcat:qengcdhjvbec3iwvnmqg7meb4u

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation [article]

Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen
2022 arXiv   pre-print
We introduce CVSS, a massively multilingual-to-English speech-to-speech translation (S2ST) corpus, covering sentence-level parallel S2ST pairs from 21 languages into English.  ...  CVSS is derived from the Common Voice speech corpus and the CoVoST 2 speech-to-text translation (ST) corpus, by synthesizing the translation text from CoVoST 2 into speech using state-of-the-art TTS systems  ...  Common Voice (Ardila et al., 2020 ) is a massively multilingual transcribed speech corpus designed for ASR.  ... 
arXiv:2201.03713v3 fatcat:dspwplxaxber3jic532k5fqkxe

HASS RDC and Indigenous Research Capability: Language Data Commons of Australia (LDaCA) Project Proposal

Michael Haugh
2021 Zenodo  
Project proposal for the Language Data Commons of Australia (LDaCA)  ...  Proposal summary Australia is a massively multilingual country, in one of the world's most linguistically diverse regions.  ...  The Languages Data Commons of Australia (LDaCA) aims to establish a sustainable long-term repository for ingesting and curating language data collections of national significance: to democratise access  ... 
doi:10.5281/zenodo.6552013 fatcat:ktbge4nnpvfjbmejor5xwkme7u

Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition [article]

Rodolfo Zevallos, Luis Camacho, Nelsi Melgarejo
2022 arXiv   pre-print
The Huqariq corpus is a multilingual collection of speech from native Peruvian languages.  ...  The corpus has 220 hours of transcribed audio recorded by more than 500 volunteers, making it the largest speech corpus for native languages in Peru.  ...  Concluding remarks We have presented Huqariq: a multilingual speech corpus of Peruvian native languages for the development of speech recognition tools.  ... 
arXiv:2207.05498v1 fatcat:guw3g2gxorczrpgcblotapqgui

ATCSpeech: a multilingual pilot-controller speech corpus from real Air Traffic Control environment [article]

Bo Yang, Xianlong Tan, Zhengmao Chen, Bing Wang, Dan Li, Zhongping Yang, Xiping Wu, Yi Lin
2019 arXiv   pre-print
In this paper, a multilingual speech corpus (ATCSpeech) from real ATC systems, including accented Mandarin Chinese and English, is built and released to encourage the non-commercial ASR research in ATC  ...  To our best knowledge, this is the first work that aims at building a real and multilingual ASR corpus for the air traffic related research.  ...  In short, our database is a multilingual industrial ASR corpus in ATC domain. The raw data is monaural speech with 8000 Hz sample rate and 16 bits sample size.  ... 
arXiv:1911.11365v1 fatcat:vmodvedmevd7zk6tzajybuwn5q

Model-Agnostic Fast Adaptive Multi-Objective Balancing Algorithm for Multilingual Automatic Speech Recognition Model Training

Jiabin Xue, Tieran Zheng, Jiqing Han
2021 Conference of the International Speech Communication Association  
The model trained by MAFA outperforms the baseline model on the Common Voice corpus.  ...  This paper regards multilingual automatic speech recognition model training as a multi-objective problem because learning different languages may conflict, necessitating a trade-off.  ...  Voice corpus [40] , a massively-multilingual collection of transcribed speech intended for speech technology research.  ... 
doi:10.21437/interspeech.2021-355 dblp:conf/interspeech/XueZH21 fatcat:pwj7xbgpufcefayfqzu24q2fcu

FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task [article]

Yun Tang, Hongyu Gong, Xian Li, Changhan Wang, Juan Pino, Holger Schwenk, Naman Goyal
2021 arXiv   pre-print
In some translation directions, our speech translation results evaluated on the public Multilingual TEDx test set are even comparable with the ones from a strong text-to-text translation system, which  ...  In this paper, we describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign on the Multilingual Speech Translation shared task.  ...  The audio corpora used in our experiments include Common Voice and Multilingual LibriSpeech (MLS). • Common Voice (Ardila et al., 2020) .  ... 
arXiv:2107.06959v2 fatcat:ubwxhxiiivgcfoktasexnv4umm

MAESTRO: Matched Speech Text Representations through Modality Matching [article]

Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Moreno, Ankur Bapna, Heiga Zen
2022 arXiv   pre-print
We establish a new state-of-the-art (SOTA) on VoxPopuli multilingual ASR with a 8% relative reduction in Word Error Rate (WER), multidomain SpeechStew ASR (3.7% relative) and 21 languages to English multilingual  ...  Learning aligned representations from unpaired speech and text sequences is a challenging task.  ...  Multilingual ASR: Following [11] , we use 429k hours of public unlabeled speech corpora: VoxPopuli [33] , Common-Voice [34] , MLS [35] and BABEL [36] to pre-train multilingual ASR models.  ... 
arXiv:2204.03409v2 fatcat:groj6z3xxvfffb626kkm4stmve

Pseudo-Labeling for Massively Multilingual Speech Recognition [article]

Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert
2022 arXiv   pre-print
In this work, we extend pseudo-labeling to massively multilingual speech recognition with 60 languages.  ...  Experiments on the labeled Common Voice and unlabeled VoxPopuli datasets show that our recipe can yield a model with better performance for many languages that also transfers well to LibriSpeech.  ...  In this work, we go beyond the monolingual setting and demonstrate the use of pseudo-labeling to improve a massively multilingual speech recognizer trained on all 60 languages of the Common Voice dataset  ... 
arXiv:2111.00161v3 fatcat:zbyye7vwtbfdjnjtsautwbqlnm

ATCSpeech: A Multilingual Pilot-Controller Speech Corpus from Real Air Traffic Control Environment

Bo Yang, Xianlong Tan, Zhengmao Chen, Bing Wang, Min Ruan, Dan Li, Zhongping Yang, Xiping Wu, Yi Lin
2020 Interspeech 2020  
In this paper, a multilingual speech corpus (ATCSpeech) from real ATC systems, including accented Mandarin Chinese and English speeches, is built and released to encourage the noncommercial ASR research  ...  To our best knowledge, this is the first work that aims at building a real and multilingual ASR corpus for the ATC related research.  ...  Data Features Summary Almost all the speeches in this corpus are collected from the voice record devices of real ATC systems in China.  ... 
doi:10.21437/interspeech.2020-1020 dblp:conf/interspeech/YangTCWRLYW020 fatcat:6gfq6z77hzadxgpuglgciocyn4

MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible [article]

Marcely Zanon Boito, William N. Havard, Mahault Garnerin, Éric Le Ferrand, Laurent Besacier
2020 arXiv   pre-print
The CMU Wilderness Multilingual Speech Dataset (Black, 2019) is a newly published multilingual speech dataset based on recorded readings of the New Testament.  ...  We name this corpus MaSS (Multilingual corpus of Sentence-aligned Spoken utterances).  ...  Use Case: Multilingual Speech Retrieval Task Baseline In this section we showcase the usefulness of our corpus on a multilingual setting.  ... 
arXiv:1907.12895v3 fatcat:giqdjrajgngvxabrbr7dhbzik4
« Previous Showing results 1 — 15 out of 1,770 results