A simple Galois Power-of-Two real time embedding scheme for performing Arabic morphology deep learning tasks

Mohammed A. ELAffendi, Ibrahim Abuhaimed, Khawla AlRajhi
2020 Egyptian Informatics Journal  
This paper describes how a simple novel Galois Power-of-Two (GPOW2) real-time embedding scheme is used to improve the performance and accuracy of downstream NLP tasks. GPOW2 computes embeddings live on the fly (real time) in the context of target NLP tasks without the need for tabulated preembeddings. One excellent feature of the method is the ability to capture multilevel embeddings in the same pass. It simultaneously computes character, word and sentence embeddings on the fly. GPOW2 has been
more » ... erived in the context of attempts to improve the performance of the SWAM Arabic morphological engine, which is a multipurpose tool that supports segmentation, classification, POS tagging, spell checking, word embeddings, sematic search, among other tasks. SWAM is a pattern-oriented algorithm that relies on morphological patterns and POS tagging to perform NLP tasks. The paper demonstrates how GPOW2 led to improvements in the accuracy of POS tagging and pattern matching, and accordingly the performance of the whole engine. The accuracy for pattern prediction is 99.47% and is 98.80% for POS tagging. Ó 2020 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Computers and Artificial Intelligence, Cairo University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
doi:10.1016/j.eij.2020.03.002 fatcat:72tunevhcze7dm42fpeaa3zeby