A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
Sequence model learning algorithms typically maximize log-likelihood minus the norm of the model (or minimize Hamming loss + norm). In cross-lingual part-of-speech (POS) tagging, our target language training data consists of sequences of sentences with word-by-word labels projected from translations in k languages for which we have labeled data, via word alignments. Our training data is therefore very noisy, and if Rademacher complexity is high, learning algorithms are prone to overfit.arXiv:1601.02166v1 fatcat:fbjqdi7kp5gyjgfks4e6f3zqy4