TeamX at CLEF eHealth 2020: ICD Coding with N-gram Encoder and Code-filtering Strategy

Yuki Tagawa, Norihisa Nakano, Ryota Ozaki, Tomoki Taniguchi, Tomoko Ohkuma
2020 Conference and Labs of the Evaluation Forum  
The International Classification of Diseases (ICD) is a medical classification that provides a systematized code of diseases. ICD is widely used for statistical comparisons and patient billing; however, manual ICD coding is time-consuming and prone to errors. In this study, we work on an automatic ICD10-CM and ICD10-PCS coding to Spanish clinical cases at CLEF eHealth 2020 Task 1. We tackle the ICD10-CM and ICD10-PCS coding as a multi-label classification problem and our method has three main
more » ... pects: ( i ) N-gram encoder : learning N-gram embeddings by encoding an input document; (ii) Code-filtering strategy: reducing the label space by limiting the number of target code; (iii)Weighted binary cross-entropy (BCE): extending the BCE to alleviate the data imbalance problem. We evaluated our method based on the mean average precision, achieving final scores of 0.299 for ICD10-CM and 0.199 for ICD10-PCS. Introduction In clinical practice, considerable amounts of text data (e.g., discharge summaries, radiology reports, and other narrative components of electronic health records) are created every day. Such data are managed using the International Classification of Diseases (ICD) codes for reporting diagnosis and statistical comparisons of morbidity and mortality. ICD is a medical classification provided by the World Health Organization, and it assigns a unique alphanumeric code to diseases, injuries, signs, procedures, and symptoms. Although ICD codes are widely used for statistical analysis, decision-making, and even for reimbursement, manual ICD coding is time-consuming and prone to errors. Hence, automatic ICD coding is in high demand. Automatic ICD coding [12, 16, 19] is the prediction of suitable ICD codes on the basis of an input document. As a type of multilingual information extraction, the CLEF eHealth community has been organizing shared tasks on ICD
dblp:conf/clef/TagawaNOTO20 fatcat:qbh2v6wdcbcwnpy7miawbkulru