AUTOMATIC BUILDING ARABIC DOMAIN MORPHOLOGICAL DICTIONARY USING PART OF SPEECH TAGGING

O. El Barbary
2018 Journal of the Egyptian Mathematical Society  
Arabic language is still facing some difficulties in automatic processing relating to the richness, morphology, phonetic and lexicon. This paper presents a new strategy for building a morphological field dictionary for Arabic language. Our strategy is divides into two parts. The first, extracts an efficient Field Association FA words for each domain specific. Second, generates the Part Of Speech (POS) tagging for these (FA) word and collect them in one frame. After that, the FA words with its
more » ... ame collected in alphabetic order. The method of building the automatic morphological field dictionary using a main algorithm is discussed and studied. The advantage of our approach is to build an extended and updated automatic Arabic morphological field dictionary. The average of the accuracy measures (F measures) of the experimental results is up to 76 %. Recently, the amount of information of all kinds available electronically has increased rapidly. So, there is a huge need to search and organize enormous amounts of information in text documents. Text searching is one of the most essential operations in information retrieval systems. With the extensive use of the internet, with its different powerful searching capabilities and applications, this importance has gained a high impetus in the last few years. One of the main problems intrinsic in free-text searching is the variation encountered in word forms due to derivational and inflectional requirements. Hence, a simple matching process becomes irrelevant for efficient information retrieval purposes. This has led us to devise and develop other techniques for improving search performance. A Field Association (FA) word is a new technique for selecting efficient words that can be related to specified field. The person can recognize a field like mathematics by finding any of these words quantity, structure, space, change, deduction, abstraction, counting, calculation and measurement. Readers generally identify the subject of a text when they notice specific terms, called field association terms [1, 2, 3] . Arabic is the most commonly spoken language after Chinese1. It is probable that with approximately 422 million native speakers, The rich morphology of Arabic and the more complex word formation all contribute to produce Arabic IR researches depending on Arabic morphology. It becomes an integral part of many Arabic information retrieval system. Arabic offers special challenges for data driven. An Arabic word consists of a stem with a consonantal root and pattern. Furthermore, it contains affixes and vowels; also sometimes the same root with different vowels stands for different meanings. Most pervious work in AIR depends on stem [4] . Stemming is a tool used in IR to combat the vocabulary mismatch problem. This requires deleting the vowels and it is a big mistake because many words become the same although they differ in meaning. The Arabic language has a special characteristic differs from other languages, most languages construct words out of morphemes which are just concatenated one after another, for example un+ fail + ing. In these languages like
doi:10.21608/joems.2018.2984.1053 fatcat:edrhnequdrgx3bp7mfwwjwvpzu