A Joint Model of Orthography and Morphological Segmentation

Ryan Cotterell, Tim Vieira, Hinrich Schütze
2016 Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies  
We present a model of morphological segmentation that jointly learns to segment and restore orthographic changes, e.g., funniest → fun-y-est. We term this form of analysis canonical segmentation and contrast it with the traditional surface segmentation, which segments a surface form into a sequence of substrings, e.g., funniest → funn-i-est. We derive an importance sampling algorithm for approximate inference in the model and report experimental results on English, German and Indonesian.
doi:10.18653/v1/n16-1080 dblp:conf/naacl/CotterellVS16 fatcat:mx5umgpmtfh6fn4447ynlnxd44