A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
GCM: A Toolkit for Generating Synthetic Code-mixed Text
2021
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
unpublished
Code-mixing is common in multilingual communities around the world, and processing it is challenging due to the lack of labeled and unlabeled data. We describe a tool that can automatically generate code-mixed data given parallel data in two languages. We implement two linguistic theories of code-mixing, the Equivalence Constraint theory and the Matrix Language theory to generate all possible code-mixed sentences in the language-pair, followed by sampling of the generated data to generate
doi:10.18653/v1/2021.eacl-demos.24
fatcat:wsdt6plbtzdyfhvayszeg5tkia