Information Extraction of Multiple Categories from Pathology Reports

Yue Li, David Martínez
2010 Australasian Language Technology Association Workshop  
Pathology reports are used to store information about cells and tissues of a patient, and they are crucial to monitor the health of individuals and population groups. In this work we present an evaluation of supervised text classification models for the prediction of relevant categories in pathology reports. Our aim is to integrate automatic classifiers to improve the current workflow of medical experts, and we implement and evaluate different machine learning approaches for a large number of
more » ... tegories. Our results show that we are able to predict nominal categories with high average f-score (81.3%), and we can improve over the majority class baseline by relying on Naive Bayes and feature selection. We also find that the classification of numeric categories is harder, and deeper analysis would be required to predict these labels.
dblp:conf/acl-alta/LiM10 fatcat:nchi2euhijdchp6idanwjivuc4