"Is this Herpes or Syphilis?": Latent Dirichlet Allocation Analysis of Sexually Transmitted Disease-Related Reddit Posts During the COVID-19 Pandemic
Amy K Johnson,
runa bhaumik,
Debarghya Nandi,
Abhishikta Roy,
Supriya Mehta
<jats:title>Abstract</jats:title><jats:sec><jats:title>Background</jats:title>Sexually Transmitted Diseases (STDs) are common and costly, impacting approximately one in five people annually. Reddit, the sixth most used internet site in the world, is a user-generated social media discussion platform that may be useful in monitoring discussion about STD symptoms and exposure.</jats:sec><jats:sec><jats:title>Objective</jats:title>This study sought to define and identify patterns and insights into STD related discussions on Reddit over the course of the COVID-19 pandemic.</jats:sec><jats:sec><jats:title>Methods</jats:title>We extracted posts from Reddit from March 2019 through July 2021. We used a machine learning text mining method, Latent Dirichlet Allocation (LDA), to conduct a text analysis to identify the most common topics discussed in the Reddit posts. We then used word clouds, qualitative topic labelling, and spline regression to characterize the content and distribution of topics observed.</jats:sec><jats:sec><jats:title>Results</jats:title>Our extraction resulted in 24,311 total posts. LDA Coding showed that with 8 topics for each time period we achieved high coherence values (pre-COVID=0.41, pre-vaccine=0.42; post-vaccine=0.44). While most topic categories remained the same over time, the relative proportion of topics changed and new topics emerged. Spline regression revealed some key terms had variability in the percentage of posts that coincided with COVID-19 pre- and post-periods, while others were uniform across the study periods.</jats:sec><jats:sec><jats:title>Conclusions</jats:title>Our study's use of Reddit is a novel way to gain insights into STD symptoms experienced, potential exposures, testing decisions, common questions, and behavior patterns (e.g., during lock down periods). For example, reduction in STD screening may result in observed negative health outcomes due to missed cases, which also impacts onward transmission. As Reddit use is anonymous, users may discuss sensitive topics with greater detail, and more freely than in clinical encounters. Data from anonymous Reddit posts may be leveraged to enhance understanding of the distribution of disease and need for targeted outreach/screening programs. This study demonstrates Reddit has feasibility and utility to enhance understanding of sexual behaviors, STD experiences, and needed health engagement with the public.</jats:sec>
In application/xml+jats
Archived Files and Locations
651.6 kB
www.medrxiv.org (repository) web.archive.org (webarchive) |
Date 2022-02-15
access all versions, variants, and formats of this works (eg, pre-prints)
Crossref Metadata (via API)
Semantic Scholar
Google Scholar