A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Domain Adaptation via Pseudo In-Domain Data Selection
2011
Conference on Empirical Methods in Natural Language Processing
We explore efficient domain adaptation for the task of statistical machine translation based on extracting sentences from a large generaldomain parallel corpus that are most relevant to the target domain. These sentences may be selected with simple cross-entropy based methods, of which we present three. As these sentences are not themselves identical to the in-domain data, we call them pseudo in-domain subcorpora. These subcorpora -1% the size of the original -can then used to train small
dblp:conf/emnlp/AxelrodHG11
fatcat:raevgcmfyzdifhkenxbdqabpdu