A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
CrowdSpeech and VoxDIY: Benchmark Datasets for Crowdsourced Audio Transcription
[article]
2021
arXiv
pre-print
For this, we collect and release CrowdSpeech -- the first publicly available large-scale dataset of crowdsourced audio transcriptions. ...
In that, we design a principled pipeline for constructing datasets of crowdsourced audio transcriptions in any novel domain. ...
Acknowledgements & Author Contributions I.S. and N.P. designed the setup of the problem and the data-collection pipeline. N.P. collected data on Toloka. N.P. and D.U. conducted data analysis. ...
arXiv:2107.01091v2
fatcat:l6dvlvq4fzdubjyc3sd2icwqf4