A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2008; you can also visit the original URL.
The file type is application/pdf
.
Rule-based word clustering for document metadata extraction
2005
Proceedings of the 2005 ACM symposium on Applied computing - SAC '05
Text classification is still an important problem for unlabeled text; CiteSeer, a computer science document search engine, uses automatic text classification methods for document indexing. Text classification uses a document's original text words as the primary feature representation. However, such representation usually comes with high dimensionality and feature sparseness. Word clustering is an effective approach to reduce feature dimensionality and feature sparseness, and improve text
doi:10.1145/1066677.1066917
dblp:conf/sac/HanMZTGZ05
fatcat:bd4uexds5zdxvofvac575fpwwy