A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Sanskrit Word Segmentation Using Character-level Recurrent and Convolutional Neural Networks
2018
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
The paper introduces end-to-end neural network models that tokenize Sanskrit by jointly splitting compounds and resolving phonetic merges (Sandhi). Tokenization of Sanskrit depends on local phonetic and distant semantic features that are incorporated using convolutional and recurrent elements. Contrary to most previous systems, our models do not require feature engineering or extern linguistic resources, but operate solely on parallel versions of raw and segmented text. The models discussed in
doi:10.18653/v1/d18-1295
dblp:conf/emnlp/HellwigN18
fatcat:ukccvdaedvdh5fqedhdi4jm6hm