A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Crowdsourcing Interaction Logs to Understand Text Reuse from the Web
2013
Annual Meeting of the Association for Computational Linguistics
We report on the construction of the Webis text reuse corpus 2012 for advanced research on text reuse. The corpus compiles manually written documents obtained from a completely controlled, yet representative environment that emulates the web. Each of the 297 documents in the corpus is about one of the 150 topics used at the TREC Web Tracks 2009-2011, thus forming a strong connection with existing evaluation efforts. Writers, hired at the crowdsourcing platform oDesk, had to retrieve sources for
dblp:conf/acl/PotthastHVS13
fatcat:ictfs2hjxfeobnj6fucfowtpzi