Dialect topic modeling for improved consumer medical search

Steven P Crain, Shuang-Hong Yang, Hongyuan Zha, Yu Jiao
2010 AMIA Annual Symposium Proceedings  
Access to health information by consumers is hampered by a fundamental language gap. Current attempts to close the gap leverage consumer oriented health information, which does not, however, have good coverage of slang medical terminology. In this paper, we present a Bayesian model to automatically align documents with different dialects (slang, common and technical) while extracting their semantic topics. The proposed diaTM model enables effective information retrieval, even when the query
more » ... ains slang words, by explicitly modeling the mixtures of dialects in documents and the joint influence of dialects and topics on word selection. Simulations using consumer questions to retrieve medical information from a corpus of medical documents show that diaTM achieves a 25% improvement in information retrieval relevance by nDCG@5 over an LDA baseline.
pmid:21346955 pmcid:PMC3041409 fatcat:xnnnmab3c5copobmz5po3u7poa