Language clustering with word co-occurrence networks based on parallel texts

HaiTao Liu, Jin Cong
2013 Chinese Science Bulletin  
This study investigates the feasibility of applying complex networks to fine-grained language classification and of employing word co-occurrence networks based on parallel texts as a substitute for syntactic dependency networks in complex-network-based language classification. 14 word co-occurrence networks were constructed based on parallel texts of 12 Slavic languages and 2 non-Slavic languages, respectively. With appropriate combinations of major parameters of these networks, cluster
more » ... was able to distinguish the Slavic languages from the non-Slavic and correctly group the Slavic languages into their respective sub-branches. Moreover, the clustering could also capture the genetic relationships of some of these Slavic languages within their sub-branches. The results have shown that word co-occurrence networks based on parallel texts are applicable to fine-grained language classification and they constitute a more convenient substitute for syntactic dependency networks in complex-networkbased language classification. word co-occurrence network, Slavic languages, parallel texts, language classification, cluster analysis Citation: Liu H T, Cong J. Language clustering with word co-occurrence networks based on parallel texts.
doi:10.1007/s11434-013-5711-8 fatcat:4sxlq7eojfcb5o4m2b6wy3dmum