Implementing Semantic Annotation in a Ukrainian Corpus

Vasyl Starko
2021 International Conference on Computational Linguistics and Intelligent Systems  
The paper describes the first phase of semantic annotation implemented in the General Regionally Annotated Corpus of Ukrainian (GRAC) using the Ukrainian Semantic Lexicon (USL) and the TagText tagger for Ukrainian. Over 1,000 most frequent lemmas were supplied with semantic tags, creating the foundation for the lexicon. In the process of developing the USL, the original semantic tagset underwent changes and was expanded. The revised tagset is presented, and the linguistic aspects of practical
more » ... mantic annotation are analyzed. The TagText tagger was updated to enable both morphological and semantic annotation of Ukrainian texts. The current versions of the USL and TagText are released and available for download. Text coverage by semantic tags in GRAC is discussed, and examples of semantic and complex searches in the GRAC corpus are provided. Plans for future work on the USL are outlined.
dblp:conf/colins/Starko21 fatcat:yj3fnvdnwvctrc3wpblzbavybe