A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Substring selection for biomedical document classification
2006
Proceedings of the 1st international workshop on Text mining in bioinformatics - TMBIO '06
Motivation: Attribute selection is a critical step in development of document classification systems. As a standard practice, words are stemmed and the most informative ones are used as attributes in classification. Owing to high complexity of biomedical terminology, general-purpose stemming algorithms are often conservative and could also remove informative stems. This can lead to accuracy reduction, especially when the number of labeled documents is small. To address this issue, we propose an
doi:10.1145/1183535.1183537
fatcat:c2xgm2f4u5dy3ia6ga2ls3uvse