Entropy-based Training Data Selection for Domain Adaptation

Yan Song, Prescott Klassen, Fei Xia, Chunyu Kit
2012 International Conference on Computational Linguistics  
Training data selection is a common method for domain adaptation, the goal of which is to choose a subset of training data that works well for a given test set. It has been shown to be effective for tasks such as machine translation and parsing. In this paper, we propose several entropy-based measures for training data selection and test their effectiveness on two tasks: Chinese word segmentation and part-of-speech tagging. The experimental results on the Chinese Penn Treebank indicate that
more » ... of the measures provide a statistically significant improvement over random selection for both tasks.
dblp:conf/coling/SongKXK12 fatcat:6z4t5jf4ubamlf7jfzgl5vnoky