A Novel Approach to Arabic Keyphrase Extraction
Innovative Computing Information and Control Express Letters, Part B: Applications
Keyword extraction is one of the most important research areas of information retrieval. The task is challenging, and it has been receiving the attention of researchers in the last decade. The importance of this problem originates from the fact that extracted keywords can be used in many fields such as document indexing, clustering, classification, summarization, metadata generation, topic identification, and information visualization. In addition, recent years have witnessed a dramatic growth
... n the number of documents that are available online with no key-phrases assigned. Assigning keyphrase to such documents manually is impractical. This situation demands automatic keyphrase extraction. In this regard, several approaches have been proposed in the literature. These approaches use techniques borrowed from areas such as machine learning, computational linguistic and statistical analysis. In this paper, Arabic keyphrase extraction system is developed for Arabic documents. A new boosting factor is proposed by which occurrence of compound terms is boosted based on occurrences of their words. This is motivated by the fact that long phrases are preferred to be keywords than single words. The performance of the proposed keyphrase extraction method is evaluated using three Arabic datasets and the results show that the proposed method has comparable performance to that of KP-Miner.