Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent

Anoop Kunchukuttan, Ratish Puduppully, Pushpak Bhattacharyya
2015 Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations  
We present Brahmi-Net -an online system for transliteration and script conversion for all major Indian language pairs (306 pairs). The system covers 13 Indo-Aryan languages, 4 Dravidian languages and English. For training the transliteration systems, we mined parallel transliteration corpora from parallel translation corpora using an unsupervised method and trained statistical transliteration systems using the mined corpora. Languages which do not have parallel corpora are supported by
more » ... ration through a bridge language. Our script conversion system supports conversion between all Brahmi-derived scripts as well as ITRANS romanization scheme. For this, we leverage co-ordinated Unicode ranges between Indic scripts and use an extended ITRANS encoding for transliterating between English and Indic scripts. The system also provides top-k transliterations and simultaneous transliteration into multiple output languages. We provide a Python as well as REST API to access these services. The API and the mined transliteration corpus are made available for research use under an open source license.
doi:10.3115/v1/n15-3017 dblp:conf/naacl/KunchukuttanPB15 fatcat:5uezs2orvrg27ibuhfnnkmbypi