Applying Convolutional Neural Network Model and Auto - expanded Corpus to Biomedical Abbreviation Disambiguation

Ren Kai, Computer Sch ool, Wuhan University, Wuhan, 430072, China, Wang Shi - Wen, College of Computer Science, South - Central University for Nationalities, Wuhan, China, Confucius Institute, New Jersey City University, 2039 Kennedy Blvd, Jersey City, NJ, 07305, United States
2016 Journal of Engineering Science and Technology Review  
The polysemy phenomenon of abbreviations in the medical domain generates a prodigious effect on the accuracy of computer auto text analysis. Hence, abbreviation disambiguation has been extensively studied in recent years. A large quantity of manually labelled corpuses is required in existing methods for training models, thereby restricting the application range of abbreviation disambiguation. This study proposes an abbreviation disambiguation method based on the convolutional neural network
more » ... ) to solve the abbreviation disambiguation problem in the biomedical field when no labelled corpus exists. First, the full name of the ambiguous abbreviation was taken as the keyword to obtain a large quantity of texts on Medline as the training corpus. The corpus was then applied to the improved CNN model, through which each abbreviation was mapped onto the corresponding sense to complete the abbreviation disambiguation. A test was conducted on 103 common biomedical abbreviations. Results show that the method obtained an average of 90.1% accuracy, which is significantly higher than the other unsupervised abbreviation disambiguation methods. This study provides a basis for effectively improving the accuracy of abbreviation disambiguation in the biomedical field without a large labelled corpus and for increasing the accuracy of follow-up work, such as information retrieval and relation extraction. Thus, the proposed method can be applied to computer analytical research on real-time updated medical big data.
doi:10.25103/jestr.096.27 fatcat:fhx7d4q265a3tk4jmxxuixh2aa