Hindi to English Machine Transliteration of Named Entities using Conditional Random Fields

Manikrao LDhore, Shantanu K Dixit, Tushar D Sonwalkar
2012 International Journal of Computer Applications  
Machine transliteration has received significant research attention in recent years. In most cases, the source language has been English and the target language is an Asian language. This paper focuses on Hindi to English machine transliteration of Indian named entities such as proper nouns, place names and organization names using conditional random fields (CRF). Hindi is the national language of the India and spoken by more than 500 millions Indian. Hindi is the world"s fourth most commonly
more » ... ed language after Chinese, English and Spanish. This system takes Indian place name as an input in Hindi language using Devanagari script and transliterates it into English. The input to the system is provided in the form of syllabification in order to apply the n-gram techniques. As more than 50% named entities are formed as a combination of two and three syllabic units, the ngram approach with unigrams, bigrams and trigrams of Hindi are used to train the corpus. The system provides the satisfactory performance for trigrams as compared to unigrams and bigrams. 32 source meaning-target meaning, source syntax-target syntax and source wordtarget word.
doi:10.5120/7522-0624 fatcat:zgovvjg3m5ah7kqmz5zwwo6u74