Tumour Classification and Analysis from Breast Cancer Pathology Reports using Natural Language Processing

G. Johanna Johnsi Rani, D. Gladis, Joy John Mammen, Marie Therese Manipadam
2015 Indian Journal of Science and Technology  
Breast Cancer is the prime cause of death in Indian women. Hospitals in India use electronic means of collection and reporting of data. One such report is the Pathology report which has natural language narrations of the conditions of patients. This work aims to extract the details on Tumour (T) in the breast using pattern-matching rules and derive the pathological classification of T by applying the PTNM classification protocol by American Joint Committee on Cancer (AJCC). Information
more » ... (IR), Natural Language Processing (NLP) tasks and Information Extraction (IE) techniques are applied to develop an automated system to accomplish the task. The system analyzes the extracted and the classified values of T against the Gold Standard Values, which are derived by manual scrutiny of the reports. The evaluation of the performance of the automated system performed using three sets of Pathology reports, resulted in an average Precision of 86%, Recall of 82.7%, Specificity of 75.1% and Accuracy of 79.53%.
doi:10.17485/ijst/2015/v8i29/86268 fatcat:kykrxuby7fhqrdrt3ielgxeuju