Normalization of Noisy Text Data

Neelmay Desai, Meera Narvekar
2015 Procedia Computer Science  
The impact of Social media and SMS is increasing in our daily lives. These sources provide the analysts with large amount of text data for data mining and finding patterns. However, this data is notoriously noisy as people use lot of short hand language and hence destroying its utility for analyzing. Hence, it is important to convert this noisy text into Standard English. In this paper, we target the not-in-vocabulary (NIV) words present in these sources and propose a method to identify and
more » ... alize these NIV words. Complied databases and context are exploited to replace the ill-formed words and select the best possible correction for that word. This method can also replace internet slang into pure English and correct the spelling errors made to some extent.
doi:10.1016/j.procs.2015.03.104 fatcat:nmn5uoda4veybf4rszq7tg4dhu