A Probabilistic Generative Model of Linguistic Typology

Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein
2019 Proceedings of the 2019 Conference of the North  
In the principles-and-parameters framework, the structural features of languages depend on parameters that may be toggled on or off, with a single parameter often dictating the status of multiple features. The implied covariance between features inspires our probabilisation of this line of linguistic inquirywe develop a generative model of language based on exponential-family matrix factorisation. By modelling all languages and features within the same architecture, we show how structural
more » ... rities between languages can be exploited to predict typological features with near-perfect accuracy, outperforming several baselines on the task of predicting heldout features. Furthermore, we show that language embeddings pre-trained on monolingual text allow for generalisation to unobserved languages. This finding has clear practical and also theoretical implications: the results confirm what linguists have hypothesised, i.e. that there are significant correlations between typological features and languages.
doi:10.18653/v1/n19-1156 dblp:conf/naacl/BjervaKCA19 fatcat:knu6nj3gqzg5loth45n3vugjqi