A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is
Entropy-based Training Data Selection for Domain Adaptation
International Conference on Computational Linguistics
Training data selection is a common method for domain adaptation, the goal of which is to choose a subset of training data that works well for a given test set. It has been shown to be effective for tasks such as machine translation and parsing. In this paper, we propose several entropy-based measures for training data selection and test their effectiveness on two tasks: Chinese word segmentation and part-of-speech tagging. The experimental results on the Chinese Penn Treebank indicate thatdblp:conf/coling/SongKXK12 fatcat:6z4t5jf4ubamlf7jfzgl5vnoky