An Ensemble Method for Spelling Correction in Consumer Health Questions

Halil Kilicoglu, Marcelo Fiszman, Kirk Roberts, Dina Demner-Fushman
2015 AMIA Annual Symposium Proceedings  
Orthographic and grammatical errors are a common feature of informal texts written by lay people. Health-related questions asked by consumers are a case in point. Automatic interpretation of consumer health questions is hampered by such errors. In this paper, we propose a method that combines techniques based on edit distance and frequency counts with a contextual similarity-based method for detecting and correcting orthographic errors, including misspellings, word breaks, and punctuation
more » ... . We evaluate our method on a set of spell-corrected questions extracted from the NLM collection of consumer health questions. Our method achieves a F1 score of 0.61, compared to an informed baseline of 0.29, achieved using ESpell, a spelling correction system developed for biomedical queries. Our results show that orthographic similarity is most relevant in spelling error correction in consumer health questions and that frequency and contextual information are complementary to orthographic features.
pmid:26958208 pmcid:PMC4765565 fatcat:koal4byburagrclp26isjlkchm