Graph of Words

Michalis Vazirgiannis
2017 Proceedings of the 26th International Conference on World Wide Web Companion - WWW '17 Companion  
The Bag-of-words model has been the dominant approach for IR and Text mining for many years assuming the word independence and the frequencies as the main feature for feature selection and for query to document similarity. Although the long and successful usage, bag-of-words ignores words' order and distance within the document -weakening thus the expressive power of the distance metrics. We propose graph-of-word, an alternative approach that capitalizes on a graph representation of documents
more » ... d challenges the word independence assumption by taking into account words' order and distance. We applied graph-of-word in various tasks such as ad-hoc Information Retrieval, Single-Document Keyword Extraction, Text Categorization and Subevent Detection in Textual Streams. In all cases the the graph of word approach, assisted by degeneracy at times, outperforms the state of the art base lines in all cases.
doi:10.1145/3041021.3055362 dblp:conf/www/Vazirgiannis17 fatcat:vlplcnl3abd2foyqnirsbp75wa