Arabic Dialect Identification with Deep Learning and Hybrid Frequency Based Features

Youssef Fares, Zeyad El-Zanaty, Kareem Abdel-Salam, Muhammed Ezzeldin, Aliaa Mohamed, Karim El-Awaad, Marwan Torki
2019 Proceedings of the Fourth Arabic Natural Language Processing Workshop  
Studies on Dialectical Arabic are growing more important by the day as it becomes the primary written and spoken form of Arabic online in informal settings. Among the important problems that should be explored is that of dialect identification. This paper reports different techniques that can be applied towards such goal and reports their performance on the Multi Arabic Dialect Applications and Resources (MADAR) Arabic Dialect Corpora. Our results show that improving on traditional systems
more » ... frequency based features and non deep learning classifiers is a challenging task. We propose different models based on different word and document representations. Our top model is able to achieve an F1 macro averaged score of 65.66 on MADAR's smallscale parallel corpus of 25 dialects and Modern Standard Arabic (MSA).
doi:10.18653/v1/w19-4626 dblp:conf/wanlp/FaresEAEMET19 fatcat:bur2abyj2veabb7bn5duttfarq