Leveraging Knowledge-Based Features With Multilevel Attention Mechanisms for Short Arabic Text Classification

Iyad Alagha
2022 IEEE Access  
With the wide spread of short texts through social media platforms, there has become a growing need for effective methods for short-text classification. However, short-text classification has always been challenging due to the ambiguity and the data sparsity of the short text. A common solution is to enrich the short text with additional semantic features extracted from external knowledge, such as Wikipedia, to help the classifier better decide on the correct class. Most existing works,
more » ... focused on text written in English and benefited from the existence of entity-linking tools based on English-based knowledge bases. When it comes to the Arabic language, the exploitation of external knowledge to support the classification of Arabic short text has not been widely explored. This work presents an approach for the classification of short Arabic text that exploits both the Wikipedia-based features and the attention mechanism for effective classification. First, Wikipedia entities mentioned in the short text are identified. Then, Wikipedia categories associated with the identified entities are retrieved and filtered to retain only the most relevant categories. A deep learning model with multiple attention mechanisms is then used to encode the short text and the associated category set. Finally, the short text and category representations are combined together to be fed into the classification layer. The use of the attentive model with category filtering leads to highlighting the most important features while reducing the effect of improper features. Finally, the proposed model is evaluated by comparing it with several deep learning models.
doi:10.1109/access.2022.3175306 fatcat:jwn327xofzdyjpdcmrgo6mjyhy