BAAC: Bangor Arabic Annotated Corpus

Ibrahim S Alkhazi, William J.
<span title="">2018</span> <i title="The Science and Information Organization"> <a target="_blank" rel="noopener" href="" style="color: black;">International Journal of Advanced Computer Science and Applications</a> </i> &nbsp;
This paper describes the creation of the new Bangor Arabic Annotated Corpus (BAAC) which is a Modern Standard Arabic (MSA) corpus that comprises 50K words manually annotated by parts-of-speech. For evaluating the quality of the corpus, the Kappa coefficient and a direct percent agreement for each tag were calculated for the new corpus and a Kappa value of 0.956 was obtained, with an average observed agreement of 94.25%. The corpus was used to evaluate the widely used Madamira Arabic
ch tagger and to further investigate compression models for text compressed using partof-speech tags. Also, a new annotation tool was developed and employed for the annotation process of BAAC.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.14569/ijacsa.2018.091120</a> <a target="_blank" rel="external noopener" href="">fatcat:bbrxyukzbvahjbrkhjvmbpb7hm</a> </span>
