A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
WASSUP? LOL : Characterizing Out-of-Vocabulary Words in Twitter
[article]
2016
arXiv
pre-print
Language in social media is mostly driven by new words and spellings that are constantly entering the lexicon thereby polluting it and resulting in high deviation from the formal written version. The primary entities of such language are the out-of-vocabulary (OOV) words. In this paper, we study various sociolinguistic properties of the OOV words and propose a classification model to categorize them into at least six categories. We achieve 81.26% accuracy with high precision and recall. We
arXiv:1602.00293v1
fatcat:pqetz3mhufbf3mymfun4uafe3i