EDRAK: Entity-Centric Data Resource for Arabic Knowledge

Mohamed H. Gad-elrab, Mohamed Amir Yosef, Gerhard Weikum
2015 Proceedings of the Second Workshop on Arabic Natural Language Processing  
Online Arabic content is growing very rapidly, with unmatched growth in Arabic structured resources. Systems that perform standard Natural Language Processing (NLP) tasks such as Named Entity Disambiguation (NED) struggle to deliver decent quality due to the lack of rich Arabic entity repositories. In this paper, we introduce EDRAK, an automatically generated comprehensive Arabic entity-centric resource. EDRAK contains more than two million entities together with their Arabic names and
more » ... l keyphrases. Manual evaluation confirmed the quality of the generated data. We are making EDRAK publicly available as a valuable resource to help advance research in Arabic NLP and IR tasks such as dictionary-based Named-Entity Recognition, entity classification, and entity summarization.
doi:10.18653/v1/w15-3224 dblp:conf/wanlp/Gad-ElrabYW15 fatcat:ot4im7rl2rg4lf4j6hglmdpswq