Random-Walk Term Weighting for Improved Text Classification

Samer Hassan, Rada Mihalcea, Carmen Banea
2007 International Conference on Semantic Computing (ICSC 2007)  
This paper describes a new approach for estimating term weights in a document, and shows how the new weighting scheme can be used to improve the accuracy of a text classifier. The method uses term co-occurrence as a measure of dependency between word features. A random-walk model is applied on a graph encoding words and co-occurrence dependencies, resulting in scores that represent a quantification of how a particular word feature contributes to a given context. Experiments performed on three
more » ... andard classification datasets show that the new random-walk based approach outperforms the traditional term frequency approach of feature weighting.
doi:10.1109/icsc.2007.56 dblp:conf/semco/HassanMB07 fatcat:wpyjwpp6crcyzcm6um4uboe6da