Formalization of morphological rules of inflection in the kazakh language

G Bekmanova, A Sharipbayev, A Buribayeva, L Gumilyov
The work formalizes the rules for ending addition to the stems of Kazakh words that generate many word forms differing from each other only by endings from one and the same word. Since the formal grammars in the hierarchy of Khomski meant for the description of linguistic units' structure does not allow taking semantic features into account, for these purposes semantic neural networks are used. The problem of Natural Language Processing (NLP) has occurred almost immediately with the appearance
of first computers and remains as urgent as before. The Automatic Inflection Systems open up large possibilities for advanced study of the language lexical composition, increase competence, contribute to more rapid and efficient work with the texts containing new and unexamined words. The Kazakh language is the typical Turkic language that preserved most of the features common for that group and possesses a number of characteristic Kypchak peculiarities. The structurally-typological characteristic of the Kazakh language is generally connected with its belonging to agglutinative languages. As a rule, for the agglutinative type description a number of characteristic features considering as the phonetic so the morphological and syntactical peculiarities is applied. The Kazakh language is characterized by the multitude of word forms for each word formed by the addition of suffixes and endings to its end. The suffixes are related to the semantic category and at new words formation they often change the parts of speech to which the underlying word or the stem relates. For example, the indivisible root in the form of the verb "zhaz-write" with the addition of suffix "u" turns into the noun "zhaz+u-letter" or another verb "zhaz+u-write", and supplementing the latter with another suffix "shy" turns it into the noun "zhazu+shy-writer". At the same time, the ending addition does not change the part of speech to which the stem (indivisible root plus suffixes) relates. For example, by means of the ending "lar" it's possible to get the plural number of the noun "zhazu+lar-letters, zhazushy+lar-writers". This works will examine only the rules for addition of endings that generate many word forms differing from each other only by endings from one and the same word. This allows automatic creation of the dictionary of word forms by generating them using the rules for endings addition. At that the vowel harmony rule for sounds and syllables that stipulates the soft or hard endings addition depending on the word stem (indivisible root or root with a suffix) softness or hardness accordingly has effect. For the formalization of the rules for endings addition it is suggested to use the semantic neural network realizing the Kazakh language word forms synthesis that allows to generate the structure of the dictionary of initial forms in the form of synchronized linear tree. This determines the characteristic features (Fig. 1) on which basis the ending addition takes place.