Divergences in English-Hindi Parallel Dependency Treebanks

Himani Chaudhry, Himanshu Sharma, Dipti Misra Sharma
2013 International Conference on Dependency Linguistics  
We present, here, our analysis of systematic divergences in parallel English-Hindi dependency treebanks based on the Computational Paninian Grammar (CPG) framework. Study of structural divergences in parallel treebanks not only helps in developing larger treebanks automatically, but can also be useful for many NLP applications such as data-driven machine translation (MT) systems. Given that the two treebanks are based on the same grammatical model, a study of divergences in them could be of
more » ... ntage to such tasks, along with making it more interesting to study how and where they diverge. We consider two parallel trees divergent based on differences in constructions, relations marked, frequency of annotation labels and tree depth. Some interesting instances of structural divergences in the treebanks have been discussed in the course of this paper. We also present our task of alignment of the two treebanks, wherein we talk about our extraction of divergent structures in the trees, and discuss the results of this exercise.
dblp:conf/depling/ChaudhrySS13 fatcat:drkn3ap4yvcrtgumedgfiwa3qy