GatorTron: A Large Clinical Language Model to Unlock Patient Information from Unstructured Electronic Health Records [article]

Xi Yang, Nima Pour Nejatian, Hoo Chang Shin, Kaleb Smith, Christopher Parisien, Colin Compas, cheryl Martin, Mona Flores, Ying Zhang, Tanja Magoc, Christopher Harle, Gloria Lipori (+5 others)
2022 medRxiv   pre-print
There is an increasing interest in developing massive-size deep learning models in natural language processing (NLP) - the key technology to extract patient information from unstructured electronic health records (EHRs). However, there are limited studies exploring large language models in the clinical domain; the current largest clinical NLP model was trained with 110 million parameters (compared with 175 billion parameters in the general domain). It is not clear how large-size NLP models can
more » ... elp machines understand patients' clinical information from unstructured EHRs. In this study, we developed a large clinical transformer model - GatorTron - using >90 billion words of text and evaluated it on 5 clinical NLP tasks including clinical concept extraction, relation extraction, semantic textual similarity, natural language inference, and medical question answering. GatorTron is now the largest transformer model in the clinical domain that scaled up from the previous 110 million to 8.9 billion parameters and achieved state-of-the-art performance on the 5 clinical NLP tasks targeting various healthcare information documented in EHRs. GatorTron models perform better in understanding and utilizing patient information from clinical narratives in ways that can be applied to improvements in healthcare delivery and patient outcomes.
doi:10.1101/2022.02.27.22271257 fatcat:mqbbbce4wre5bbgdbkqd7o44fq