A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is application/pdf
.
Scalable topical phrase mining from text corpora
2014
Proceedings of the VLDB Endowment
While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inherent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases of mixed lengths. Existing work either performs post processing to the results of unigram-based topic models, or utilizes complex n-gramdiscovery topic models. These methods generally produce low-quality topical phrases or suffer from poor scalability on even moderately-sized datasets.
doi:10.14778/2735508.2735519
fatcat:l5dgrmk3sngg3aghbnyrbahmli