Experience of Using SVM for the Triage Task in TREC 2004 Genomics Track

Dell Zhang, Wee Sun Lee
2004 Text Retrieval Conference  
This paper reports our knowledge-ignorant machine learning approach to the triage task in TREC2004 genomics track, which is actually a text categorization problem. We applied Support Vector Machine (SVM) and found that information-gain based feature selection is helpful. Although we achieved decent performance in leave-one-out cross-validation experiments, the evaluation result on the test data turned out to be surprisingly poor. Further experiments revealed that there is a chasm between the
more » ... ining and test data distributions. It seems that more aggressive feature selection can partially alleviate the trouble caused by distribution change.
dblp:conf/trec/ZhangL04 fatcat:nobsnf5zg5bptpnivnsteat2ya