Pattern-based information extraction from pathology reports for cancer registration

Giulio Napolitano, Colin Fox, Richard Middleton, David Connolly
2010 Cancer Causes and Control  
Objective To evaluate precision and recall rates for the automatic extraction of information from free-text pathology reports. To assess the impact that implementation of pattern-based methods would have on cancer registration completeness. Method Over 300,000 electronic pathology reports were scanned for the extraction of Gleason score, Clark level and Breslow depth, by a number of Perl routines progressively enhanced by a trial-and-error method. An additional test set of 915 reports
more » ... y containing Gleason score was used for evaluation. Results Values for recall and precision of over 98 and 99%, respectively, were easily reached. Potential increase in cancer staging completeness of up to 32% was proved. Conclusions In cancer registration, simple pattern matching applied to free-text documents can be effectively used to improve completeness and accuracy of pathology information.
doi:10.1007/s10552-010-9616-4 pmid:20652738 fatcat:2fy5cgwjvrbqfpiadlup2xkpei