A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
The SADID Evaluation Datasets for Low-Resource Spoken Language Machine Translation of Arabic Dialects
Proceedings of the 28th International Conference on Computational Linguistics
Low-resource Machine Translation recently gained a lot of popularity, and for certain languages, it has made great strides. However, it is still difficult to track progress in other languages for which there is no publicly available evaluation data. In this paper, we introduce benchmark datasets for Arabic and its dialects. We describe our design process and motivations and analyze the datasets to understand their resulting properties. Numerous successful attempts use large monolingual corporadoi:10.18653/v1/2020.coling-main.530 fatcat:vxrsok7rpndzpoz46fnxvkss3e