A pattern learning-based method for temporal expression extraction and normalization from multi-lingual heterogeneous clinical texts

Tianyong Hao, Xiaoyi Pan, Zhiying Gu, Yingying Qu, Heng Weng
2018 BMC Medical Informatics and Decision Making  
Temporal expression extraction and normalization is a fundamental and essential step in clinical text processing and analyzing. Though a variety of commonly used NLP tools are available for medical temporal information extraction, few work is satisfactory for multi-lingual heterogeneous clinical texts. Methods: A novel method called TEER is proposed for both multi-lingual temporal expression extraction and normalization from various types of narrative clinical texts including clinical data
more » ... sts, clinical notes, and clinical trial summaries. TEER is characterized as temporal feature summarization, heuristic rule generation, and automatic pattern learning. By representing a temporal expression as a triple , TEER identifies temporal mentions M, assigns type attributes A to M, and normalizes the values of M into formal representations N. Results: Based on two heterogeneous clinical text datasets: 400 actual clinical requests in English and 1459 clinical discharge summaries in Chinese. TEER was compared with six state-of-the-art baselines. The results showed that TEER achieved a precision of 0.948 and a recall of 0.877 on the English clinical requests, while a precision of 0.941 and a recall of 0.932 on the Chinese discharge summaries. Conclusions: An automated method TEER for multi-lingual temporal expression extraction was presented. Based on the two datasets containing heterogeneous clinical texts, the comparison results demonstrated the effectiveness of the TEER method in multi-lingual temporal expression extraction from heterogeneous narrative clinical texts.
doi:10.1186/s12911-018-0595-9 pmid:29589563 pmcid:PMC5872502 fatcat:jjux5wwb4jfcxlz32jo6ndfpku