TAGME

Paolo Ferragina, Ugo Scaiella
2010 Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10  
In this paper we address the problem of accurately and efficiently cross-referencing text fragments with Wikipedia pages, in a way that structured knowledge is provided about the (unstructured) input text by resolving synonymy and polysemy. We take inspiration from the invited talk of Chakrabarti at WSDM 2010, and extend his proposed scenario from the annotation of entire documents to the annotation of short texts, such as snippets of search-engine results, tweets, news, etc.. These short and
more » ... orly composed texts pose new challenges in terms of efficiency and effectiveness of the annotation process, that we address by proposing Tagme, the first system that performs an accurate and on-the-fly annotation of these short textual fragments. A large set of experiments shows that Tagme significantly outperforms state-ofthe-art algorithms [11, 15] when they are adapted to work on short texts, and surprisingly, it results competitive (if not superior!) on long texts with the plus of being faster.
doi:10.1145/1871437.1871689 dblp:conf/cikm/FerraginaS10 fatcat:uldajld4vbbxrn5z57otfqwhmm