2 Hits in 11.1 sec

Building a User-Generated Content North-African Arabizi Treebank: Tackling Hell

Djamé Seddah, Farah Essaidi, Amal Fethi, Matthieu Futeral, Benjamin Muller, Pedro Javier Ortiz Suárez, Benoît Sagot, Abhishek Srivastava
2020 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics   unpublished
We introduce the first treebank for a romanized user-generated content variety of Algerian, a North-African Arabic dialect known for its frequent usage of code-switching.  ...  This is the first time that enough unlabeled and annotated data is provided for an emerging user-generated content dialectal language with rich morphology and code switching, making it an challenging testbed  ...  More over, being made of user-generated content, this treebank covers a large variety of language variation among native speakers and displays a high level of codeswitching.  ... 
doi:10.18653/v1/2020.acl-main.107 fatcat:nzjlr2yy3rhgxlqq3e6oaksauy

The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus [article]

Samia Touileb, Jeremy Barnes
2021 arXiv   pre-print
We introduce a newly annotated corpus of Algerian user-generated comments comprising parallel annotations of Algerian written in Latin, Arabic, and code-switched scripts, as well as annotations for sentiment  ...  b) typologically similar, but use a distinct script, or c) are typologically similar and use the same script.  ...  Build- Evelyn, Richárd Farkas, Hector Fernandez Al- ing a user-generated content North-African Arabizi calde, Jennifer Foster, Cláudia Freitas, Kazunori treebank: Tackling hell.  ... 
arXiv:2105.07400v3 fatcat:naqrpvt5snhobitapvayyellfe