A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit <a rel="external noopener" href="https://www.biorxiv.org/content/biorxiv/early/2019/10/11/663625.full.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="Cold Spring Harbor Laboratory">
<span class="release-stage" >pre-print</span>
AbstractIn 2017, drug abuse caused over 73,000 deaths in the United States. Emerging drug abuse trends are identified through community surveillance programs, medical claims data, and other healthcare system data. Social media currently exists outside that system and shows promise as an alternative means of monitoring drug abuse, but the data is massive and noisy. Initial attempts to use social media data have relied on exact text matches to drugs of interest, and therefore suffer from the gap<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1101/663625">doi:10.1101/663625</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/rnkr5hquaveffjxfdhsrihkfiy">fatcat:rnkr5hquaveffjxfdhsrihkfiy</a> </span>
more »... etween formal drug lexicons and the informal nature of social media. The Reddit comment archive represents an ideal corpus for bridging this gap. We trained a word embedding model, RedMed, to identify and retrieve health entities from Reddit data. We compare the performance of our consumer-generated corpus against publicly available models trained on expert-generated corpora. Our automated pipeline achieves an accuracy of 0.88 and a specificity of >0.9 when classifying across four different term classes. Of all drug mentions, an average of 79% (±0.5%) were exact matches to a generic or trademark drug name, 14% (±0.5%) were misspellings, 6.4% (±0.3%) were synonyms, and 0.13% (±0.05%) were pill marks. We find that our system captures an additional 20% of mentions; these would have been missed by approaches that rely solely on exact string matches. We provide a lexicon of misspellings and synonyms for 2,978 drugs and a word embedding model trained on a health-oriented subset of Reddit.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200212052418/https://www.biorxiv.org/content/biorxiv/early/2019/10/11/663625.full.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1101/663625"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> biorxiv.org </button> </a>