RedMed: Extending drug lexicons for social media applications [article]

Adam Lavertu, Russ B Altman
<span title="2019-06-06">2019</span> <i title="Cold Spring Harbor Laboratory"> bioRxiv </i> &nbsp; <span class="release-stage" >pre-print</span>
AbstractIn 2017, drug abuse caused over 73,000 deaths in the United States. Emerging drug abuse trends are identified through community surveillance programs, medical claims data, and other healthcare system data. Social media currently exists outside that system and shows promise as an alternative means of monitoring drug abuse, but the data is massive and noisy. Initial attempts to use social media data have relied on exact text matches to drugs of interest, and therefore suffer from the gap
more &raquo; ... etween formal drug lexicons and the informal nature of social media. The Reddit comment archive represents an ideal corpus for bridging this gap. We trained a word embedding model, RedMed, to identify and retrieve health entities from Reddit data. We compare the performance of our consumer-generated corpus against publicly available models trained on expert-generated corpora. Our automated pipeline achieves an accuracy of 0.88 and a specificity of >0.9 when classifying across four different term classes. Of all drug mentions, an average of 79% (±0.5%) were exact matches to a generic or trademark drug name, 14% (±0.5%) were misspellings, 6.4% (±0.3%) were synonyms, and 0.13% (±0.05%) were pill marks. We find that our system captures an additional 20% of mentions; these would have been missed by approaches that rely solely on exact string matches. We provide a lexicon of misspellings and synonyms for 2,978 drugs and a word embedding model trained on a health-oriented subset of Reddit.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1101/663625">doi:10.1101/663625</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/rnkr5hquaveffjxfdhsrihkfiy">fatcat:rnkr5hquaveffjxfdhsrihkfiy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200212052418/https://www.biorxiv.org/content/biorxiv/early/2019/10/11/663625.full.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1101/663625"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> biorxiv.org </button> </a>