A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
MOROCO: The Moldavian and Romanian Dialectal Corpus
2019
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
In this work, we introduce the Moldavian and Romanian Dialectal Corpus (MOROCO), which is freely available for download at https://github.com/butnaruandrei/MOROCO. The corpus contains 33564 samples of text (with over 10 million tokens) collected from the news domain. The samples belong to one of the following six topics: culture, finance, politics, science, sports and tech. The data set is divided into 21719 samples for training, 5921 samples for validation and another 5924 samples for testing.
doi:10.18653/v1/p19-1068
dblp:conf/acl/ButnaruI19
fatcat:76dmf2o5rbftpmjj6tqvci43da