Exploring Semantic Relatedness in Arabic Corpora using Paradigmatic and Syntagmatic Models

Adil Toumouh, Dominic Widdows, Ahmed Lehireche
2016 International Journal of Information Engineering and Electronic Business  
In this paper we explore two paradigms: firstly, paradigmatic representation via the native HAL model including a model enriched by adding word order information using the permutation technique of Sahlgren and al [21] , and secondly the syntagmatic representation via a words-by-documents model constructed using the Random Indexing method. We demonstrate that these kinds of word space models which were initially dedicated to extract similarity can also been efficient for extracting relatedness
more » ... om Arabic corpora. For a given word the proposed models search the related words to it. A result is qualified as a failure when the number of related words given by a model is less than or equal to 4, otherwise it is considered as a success. To decide if a word is related to other one, we get help from an expert of the economic domain and use a glossary 1 1 of the domain. First we begin by a comparison between a native HAL model and term-document model. The simple HAL model records a better result with a success rate of 72.92%. In a second stage, we want to boost the HAL model results by adding word order information via the permutation technique of sahlgren and al [21] . The success rate of the enriched HAL model attempt 79.2 %.
doi:10.5815/ijieeb.2016.01.05 fatcat:5dgbadrd4jc5ngo4apq4mondua