An unsupervised and customizable misspelling generator for mining noisy health-related text sources

Abeed Sarker, Graciela Gonzalez-Hernandez
2018 Journal of Biomedical Informatics  
Data collection and extraction from noisy text sources such as social media typically rely on keyword-based searching/listening. However, health-related terms are often misspelled in such noisy text sources due to their complex morphology, resulting in the exclusion of relevant data for studies. In this paper, we present a customizable data-centric system that automatically generates common misspellings for complex health-related terms, which can improve the data collection process from noisy text sources.
doi:10.1016/j.jbi.2018.11.007 pmid:30445220 pmcid:PMC6322919 fatcat:jxoqqskvdvavth2c3a6mdtvdiu