Integrating image data into biomedical text categorization

H. Shatkay, N. Chen, D. Blostein
2006 Bioinformatics  
Categorization of biomedical articles is a central task for supporting various curation efforts. It can also form the basis for effective biomedical text mining. Automatic text classification in the biomedical domain is thus an active research area. Contests organized by the KDD Cup (2002) and the TREC Genomics track (since 2003) defined several annotation tasks that involved document classification, and provided training and test data sets. So far, these efforts focused on analyzing only the
more » ... xt content of documents. However, as was noted in the KDD'02 text mining contestwhere figure-captions proved to be an invaluable feature for identifying documents of interest -images often provide curators with critical information. We examine the possibility of using information derived directly from image data, and of integrating it with text-based classification, for biomedical document categorization. We present a method for obtaining features from images and for using them -both alone and in combination with text -to perform the triage task introduced in the TREC Genomics track 2004. The task was to determine which documents are relevant to a given annotation task performed by the Mouse Genome Database curators. We show preliminary results, demonstrating that the method has a strong potential to enhance and complement traditional text-based categorization methods.
doi:10.1093/bioinformatics/btl235 pmid:16873506 fatcat:wcwxlrfb5ngc5ep4nkx7nsac7y