Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources [article]

Taolin Zhang, Chengyu Wang, Minghui Qiu, Bite Yang, Xiaofeng He, Jun Huang
2021 arXiv   pre-print
Machine Reading Comprehension (MRC) aims to extract answers to questions given a passage. It has been widely studied recently, especially in open domains. However, few efforts have been made on closed-domain MRC, mainly due to the lack of large-scale training data. In this paper, we introduce a multi-target MRC task for the medical domain, whose goal is to predict answers to medical questions and the corresponding support sentences from medical information sources simultaneously, in order to
more » ... ure the high reliability of medical knowledge serving. A high-quality dataset is manually constructed for the purpose, named Multi-task Chinese Medical MRC dataset (CMedMRC), with detailed analysis conducted. We further propose the Chinese medical BERT model for the task (CMedBERT), which fuses medical knowledge into pre-trained language models by the dynamic fusion mechanism of heterogeneous features and the multi-task learning strategy. Experiments show that CMedBERT consistently outperforms strong baselines by fusing context-aware and knowledge-aware token representations.
arXiv:2008.10327v2 fatcat:x3knjsb22jhnvpvfq2t2o6sys4