A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
First Steps Towards Coverage-Based Document Alignment
2016
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
In this paper we describe a method for selecting pairs of parallel documents (documents that are a translation of each other) from a large collection of documents obtained from the web. Our approach is based on a coverage score that reflects the number of distinct bilingual phrase pairs found in each pair of documents, normalized by the total number of unique phrases found in them. Since parallel documents tend to share more bilingual phrase pairs than non-parallel documents, our alignment
doi:10.18653/v1/w16-2369
dblp:conf/wmt/GomesL16
fatcat:nwbrd2eed5bovljzbfnjuswi4a