A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is
Proceedings of the 20th international conference on Computational Linguistics - COLING '04
We investigate unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources. Two techniques are employed: (1) simple string edit distance, and (2) a heuristic strategy that pairs initial (presumably summary) sentences from different news stories in the same cluster. We evaluate both datasets using a word alignment algorithm and a metric borrowed from machinedoi:10.3115/1220355.1220406 fatcat:tybgzpn2ljfiha3emzbqzsqmna