Identifying Anatomical Phrases in Clinical Reports by Shallow Semantic Parsing Methods

Vijayaraghavan Bashyam, Ricky K Taira
2007 2007 IEEE Symposium on Computational Intelligence and Data Mining  
Natural Language Processing (NLP) is being applied for several information extraction tasks in the biomedical domain. The unique nature of clinical information requires the need for developing an NLP system designed specifically for the clinical domain. We describe a method to identify semantically coherent phrases within clinical reports. This is an important step towards full syntactic parsing within a clinical NLP system. We use this semantic phrase chunker to identify anatomical phrases
more » ... tomical phrases within radiology reports related to the genitourinary domain. A discriminative classifier based on support vector machines was used to classify words into one of five phrase classification categories. Training of the classifier was performed using 1000 hand-tagged sentences from a corpus of genitourinary radiology reports. Features used by the classifier include n-grams, syntactic tags and semantic labels. Evaluation was conducted on a blind test set of 250 sentences from the same domain. The system achieved overall performance scores of 0.87 (precision), 0.91 (recall) and 0.89 (balanced f-score). Anatomical phrase extraction can be rapidly and accurately accomplished.
doi:10.1109/cidm.2007.368874 dblp:conf/cidm/BashyamT07 fatcat:orrds45cc5bqvkrl3dwqodcl5y