Joint learning of representations of medical concepts and words from EHR data

Tian Bai, Ashis Kumar Chanda, Brian L. Egleston, Slobodan Vucetic
2017 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)  
There has been an increasing interest in learning low-dimensional vector representations of medical concepts from electronic health records (EHRs). While EHRs contain structured data such as diagnostic codes and laboratory tests, they also contain unstructured clinical notes, which provide more nuanced details on a patient's health status. In this work, we propose a method that jointly learns medical concept and word representations. In particular, we focus on capturing the relationship between
more » ... medical codes and words by using a novel learning scheme for word2vec model. Our method exploits relationships between different parts of EHRs in the same visit and embeds both codes and words in the same continuous vector space. In the end, we are able to derive clusters which reflect distinct disease and treatment patterns. In our experiments, we qualitatively show how our methods of grouping words for given diagnostic codes compares with a topic modeling approach. We also test how well our representations can be used to predict disease patterns of the next visit. The results show that our approach outperforms several common methods.
doi:10.1109/bibm.2017.8217752 pmid:29375929 pmcid:PMC5783648 dblp:conf/bibm/BaiCEV17 fatcat:mjjw3b5dwzgtvf7stnlplgcqzq