1 Hit in 0.98 sec

CrowdSpeech and VoxDIY: Benchmark Datasets for Crowdsourced Audio Transcription [article]

Nikita Pavlichenko, Ivan Stelmakh, Dmitry Ustalov
2021 arXiv   pre-print
For this, we collect and release CrowdSpeech -- the first publicly available large-scale dataset of crowdsourced audio transcriptions.  ...  In that, we design a principled pipeline for constructing datasets of crowdsourced audio transcriptions in any novel domain.  ...  Acknowledgements & Author Contributions I.S. and N.P. designed the setup of the problem and the data-collection pipeline. N.P. collected data on Toloka. N.P. and D.U. conducted data analysis.  ... 
arXiv:2107.01091v2 fatcat:l6dvlvq4fzdubjyc3sd2icwqf4