A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Cross-corpus Native Language Identification via Statistical Embedding
2018
Proceedings of the Second Workshop on Stylistic Variation
unpublished
In this paper, we approach the task of native language identification in a realistic crosscorpus scenario where a model is trained with available data and has to predict the native language from data of a different corpus. We have proposed a statistical embedding representation reporting a significant improvement over common single-layer approaches of the state of the art, identifying Chinese, Arabic, and Indonesian in a cross-corpus scenario. The proposed approach was shown to be competitive even when the data is scarce and imbalanced. 42
doi:10.18653/v1/w18-1605
fatcat:lppobndm7zesndxthge7bixdqe