AIDArabic A Named-Entity Disambiguation Framework for Arabic Text

Mohamed Amir Yosef, Marc Spaniol, Gerhard Weikum
2014 Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)  
There has been recently a great progress in the field of automatically generated knowledge bases and corresponding disambiguation systems that are capable of mapping text mentions onto canonical entities. Efforts like the before mentioned have enabled researchers and analysts from various disciplines to semantically "understand" contents. However, most of the approaches have been specifically designed for the English language and -in particular -support for Arabic is still in its infancy. Since
more » ... the amount of Arabic Web contents (e.g. in social media) has been increasing dramatically over the last years, we see a great potential for endeavors that support an entity-level analytics of these data. To this end, we have developed a framework called AIDArabic that extends the existing AIDA system by additional components that allow the disambiguation of Arabic texts based on an automatically generated knowledge base distilled from Wikipedia. Even further, we overcome the still existing sparsity of the Arabic Wikipedia by exploiting the interwiki links between Arabic and English contents in Wikipedia, thus, enriching the entity catalog as well as disambiguation context.
doi:10.3115/v1/w14-3626 dblp:conf/wanlp/YosefSW14 fatcat:c5nwuwcwazeglkiowgfi427oya