Current Project Work on English to Kannada Machine Translation System: a Literature Survey on NLP

Mr Chethan, Chandra Basavaraddi, H Shashirekha
Language processing refers to the way human beings use words to communicate ideas and feelings, and how such communications are processed and understood. Most of the information such as news, weather report, annual report, technical and scientific books in the internet are all available only in a selected number of languages especially in English. But people who know only local languages may not be able to use these resources, though the resources are available. Indian languages have
more » ... ly lesser amount of data available over the internet. Alphabets of all Indian languages have evolved over a long period of about 2000 years from the Brahmi script which was present in india earlier than 6 thy century. B.C.. Bhrami script is a semi-alphabetic script in the sense that when words are formed by using alphabets, separate vowel signs are associated with the consonants and distinct symbols are used to denote conjunct consonants. This method is followed, at present, about in the scripts of all Indian languages; but the language involve about 100 distinct characters each. There is dire need for simplifying these scripts for two reasons. One is that because of the presence of a large number of characters, the other is that all modern communications involve the use of the computer at one stage or the other and the computer can be used with high efficiency if the total number of characters in the script of a language can be reduced. So simplification of scripts of all Indian languages is necessary for achieving high literary and efficiency in computer communication. South Dravidian languages like kannada is having almost 40 million speakers, the present kannada script has more than 110 symbols in all, 48 symbols for the 48 letters of the alphabets, 13 vowel signs for kagunita, 34 vottakshara signs, etc. and has its own independent script and long document histories. Even though kannada is a language rich in literature, its resources are poor when viewed through prism of linguistics. The development of NLP in kannada language is not explored much and is in the beginning stage compared to other Indian languages.