A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
With the continuous advancement of technology, the amount of information and knowledge disseminated on the Internet every day has been developing several times. At the same time, a large amount of bilingual data has also been produced in the real world. These data are undoubtedly a great asset for statistical machine translation research. Based on the dual-sentence quality corpus screening, two corpus screening strategies are proposed first, based on the double-sentence pair length ratio methoddoi:10.1155/2020/5828130 fatcat:ygls6pzwc5emrmupt6ml2tppau