A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Creating and exploiting a comparable corpus in cross-language information retrieval
2007
ACM Transactions on Information Systems
We present a method for creating a comparable text corpus from two document collections in different languages. The collections can be very different in origin: in this study we build a comparable corpus from articles by a Swedish news agency and a U.S. newspaper. The keys with best resolution power were extracted from the documents of one collection, the source collection, by using the relative average term frequency (RATF) value. The keys were translated into the language of the other
doi:10.1145/1198296.1198300
fatcat:ajomowrcl5agphti32eltqcmfy