Issues in Urdu-Hindi NER Output of Google and Bing Translator: An Orthographic Perspective

2019 International journal of recent technology and engineering  
Named Entity Recognition (NER) is a sub-task of information extraction in which names are extracted both from the text and linguistic corpora which is still a tough nut to crack for NLP researchers in existing Machine Translation (MT) system due to its long tail. Since decades, NER has been an area of great interest both in MT and computational linguistics, thus, several tools have been designed for their handling in different languages. Therefore, this paper aims to compare the end user output
more » ... the end user output of both Google and Bing translator with special reference to Urdu-Hindi NER. This will provide more insights in the development of intelligent language tools. Thus, on the one hand, the paper deals with orthographic challenges pertaining to UrduHindi NER in general, while on the other hand, the paper also sheds light on the transliteration issues in particular. Further, we have also investigated the personal names, and named entity of Urdu, especially ezafat constructions. Consequently, the paper also proposes to handle NER from the language engineering point of view based on the existing end user output quality. Furthermore, the MT output of both Google and Bing has been ranked on the scale of 0 to 1, where 0 assigned to the correct output while 1 given to the wrong or inaccurate output.
doi:10.35940/ijrte.d8067.118419 fatcat:wdoo6i3dabcxxpd2h46xuxhbfy