Filters








100 Hits in 4.3 sec

Learning to Explain Non-Standard English Words and Phrases [article]

Ke Ni, William Yang Wang
2017 arXiv   pre-print
Unlike prior studies that focus on matching keywords from a slang dictionary, we investigate the possibility of learning a neural sequence-to-sequence model that generates explanations of unseen non-standard  ...  We describe a data-driven approach for automatically explaining new, non-standard English expressions in a given sentence, building on a large dataset that includes 15 years of crowdsourced examples from  ...  In the NLP community, slang dictionary is widely used in many tasks and applications (Burfoot and Baldwin, 2009; Wang and McKeown, 2010; Rosenthal and McKeown, 2011) .  ... 
arXiv:1709.09254v1 fatcat:4iaadseezzaqlcv4nyn55nupzu

A Computational Framework for Slang Generation [article]

Zhewei Sun, Richard Zemel, Yang Xu
2021 arXiv   pre-print
We perform rigorous evaluations on three slang dictionaries and show that our approach not only outperforms state-of-the-art language models, but also better predicts the historical emergence of slang  ...  Slang is a common type of informal language, but its flexible nature and paucity of data resources present challenges for existing natural language systems.  ...  Acknowledgements We thank the anonymous TACL reviewers and action editors for their constructive and detailed comments.  ... 
arXiv:2102.01826v1 fatcat:adiqphp5xzhbrctdmduionr75i

A Computational Framework for Slang Generation

Zhewei Sun, Richard Zemel, Yang Xu
2021 Transactions of the Association for Computational Linguistics  
We perform rigorous evaluations on three slang dictionaries and show that our approach not only outperforms state-of-the-art language models, but also better predicts the historical emergence of slang  ...  Slang is a common type of informal language, but its flexible nature and paucity of data resources present challenges for existing natural language systems.  ...  We thank Walter Rader and Jonathon Green respectively for their permissions to use The Online Slang Dictionary and Green's Dictionary of Slang for our research.  ... 
doi:10.1162/tacl_a_00378 fatcat:h4xpei4fg5hy7llzbskohmihce

A Framework for Pre-processing of Social Media Feeds based on Integrated Local Knowledge Base [article]

Taiwo Kolajo, Olawande Daramola, Ayodele Adebiyi, Seth Aaditeshwar
2020 arXiv   pre-print
To do this, the use of an integrated knowledge base (ikb) which comprises a local knowledge source (Naijalingo), urban dictionary and internet slang was combined with the adapted Lesk algorithm to facilitate  ...  Most of the previous studies on the semantic analysis of social media feeds have not considered the issue of ambiguity that is associated with slangs, abbreviations, and acronyms that are embedded in social  ...  Slang sentiment dictionary was proposed by [44] . Slang words were extracted from the urban dictionary. Urban dictionary and related words were exploited to estimate sentiment polarity.  ... 
arXiv:2006.15854v1 fatcat:xgizqclyqbhbhfccjfgesssbfq

Augmenting semantic lexicons using word embeddings and transfer learning [article]

Thayer Alshaabi, Colin M. Van Oort, Mikaela Irene Fudolig, Michael V. Arnold, Christopher M. Danforth, Peter Sheridan Dodds
2021 arXiv   pre-print
Our first model establishes a baseline employing a simple and shallow neural network initialized with pre-trained word embeddings using a non-contextual approach.  ...  Sentiment-aware intelligent systems are essential to a wide array of applications. These systems are driven by language models which broadly fall into two paradigms: Lexicon-based and contextual.  ...  We thank Anne Marie Stupinski and Julia Zimmerman for their insightful discussion and suggestions.  ... 
arXiv:2109.09010v2 fatcat:bjc6dvilgzeirgtr4hvdox3xby

When the timeline meets the pipeline: A survey on automated cyberbullying detection

Fatma Elsafoury, Stamos Katsigiannis, Zeeshan Pervez, Naeem Ramzan
2021 IEEE Access  
We also noticed that the literature is not up-to-date with using more recent slang-based word embeddings like the urban dictionary word embeddings; with using more recent models; with using contextual  ...  To evaluate our hypothesis, we used recently released word embeddings that were pre-trained on slang-based datasets like the Urban Dictionary (UD), Sentiment Specific Word Embedding (SSWE), Glove-Twitter  ... 
doi:10.1109/access.2021.3098979 fatcat:l6bwkp5ozvduppcbes4afjqzfm

Learning a POS tagger for AAVE-like language

Anna Jørgensen, Dirk Hovy, Anders Søgaard
2016 Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies  
In this paper, we consider the problem of learning a POS tagger for subtitles, lyrics, and tweets associated with African-American Vernacular English (AAVE).  ...  We learn from a mixture of randomly sampled and manually annotated Twitter data and unlabeled data, which we automatically and partially label using mined tag dictionaries.  ...  Word representations learned from representative unlabeled data, such as word clusters or embeddings, have been proven useful for increasing the accuracy of NLP tools for low-resource languages and domains  ... 
doi:10.18653/v1/n16-1130 dblp:conf/naacl/JorgensenHS16 fatcat:xyfqyul5m5dwlgjhgm4qfwaycm

Analysing Cyberbullying using Natural Language Processing by Understanding Jargon in Social Media [article]

Bhumika Bhatia, Anuj Verma, Anjum, Rahul Katarya
2021 arXiv   pre-print
in comparison to models without slang preprocessing.  ...  We experiment through multiple models such as Bi-LSTM, GloVe, state-of-the-art models like BERT, and apply a unique preprocessing technique by introducing a slang-abusive corpus, achieving a higher precision  ...  We have elaborated the steps below for preprocessing, by removal of: a) Formation Of Slang Corpus As the dataset contains the use of urban slang lingo [15] , which is extremely common among social  ... 
arXiv:2107.08902v1 fatcat:udcoi36z4fgapbwpcf6qweb444

How to Evaluate Word Representations of Informal Domain? [article]

Yekun Chai, Naomi Saphra, Adam Lopez
2019 arXiv   pre-print
Diverse word representations have surged in most state-of-the-art natural language processing (NLP) applications.  ...  Nevertheless, how to efficiently evaluate such word embeddings in the informal domain such as Twitter or forums, remains an ongoing challenge due to the lack of sufficient evaluation dataset.  ...  Urban-Dictionary holds the promise of collaborative NLP resources in the informal domain such as Twitter and social media forrums (Nguyen et al., 2018) .  ... 
arXiv:1911.04669v2 fatcat:kvf3gptduvhhzfcximff6mimia

Mapping Consumer Sentiment Towards Wireless Services using Geospatial Twitter Data

Weijie Qi, Rob Procter, Jie Zhang, Weisi Guo
2019 IEEE Access  
Hyper-dense wireless network deployment is one of the popular solutions to meeting high capacity requirement for 5G delivery.  ...  Current generalized sentiment detection methods with generalized NLP corpora are not topic specific.  ...  It uses Amazon Mechanical Turk to label several words lists, such as the original balanced affective word list and internet slang from the Urban Dictionary and obscene words.  ... 
doi:10.1109/access.2019.2935200 fatcat:swa7vwadujfnhmsoz5j26gu3v4

Mining Cross-Cultural Differences and Similarities in Social Media

Bill Yuchen Lin, Frank F. Xu, Kenny Zhu, Seung-won Hwang
2018 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
The framework could be useful for machine translation applications and research in computational social science.  ...  For instance, people of distinct cultures often hold different opinions on a single named entity. Also, understanding slang terms across languages requires knowledge of cross-cultural similarities.  ...  Thanks to the anonymous reviewers and Hanyuan Shi for their valuable feedback. 14 We will make our code and data available at https: //github.com/adapt-sjtu/socvec.  ... 
doi:10.18653/v1/p18-1066 dblp:conf/acl/HwangZLX18 fatcat:vpllnkwulvcqphj32bldn344ny

FeelsGoodMan: Inferring Semantics of Twitch Neologisms [article]

Pavel Dolin, Luc d'Hauthuille, Andrea Vattani
2021 arXiv   pre-print
First we establish a new baseline for sentiment analysis on Twitch data, outperforming the previous supervised benchmark by 7.9% points.  ...  Secondly, we introduce a simple but powerful unsupervised framework based on word embeddings and k-NN to enrich existing models with out-of-vocabulary knowledge.  ...  (Wilson et al., 2020) used Urban Dictionary 2 as a corpus to create slang word embeddings.  ... 
arXiv:2108.08411v2 fatcat:b6dmi3nuhnbqvgecwoxe4xjjeq

Simple Models for Word Formation in English Slang [article]

Vivek Kulkarni, William Yang Wang
2018 arXiv   pre-print
We propose generative models for three types of extra-grammatical word formation phenomena abounding in English slang: Blends, Clippings, and Reduplicatives.  ...  Overall, our models reveal insights into the generative processes of word formation in slang -- insights which are increasingly relevant in the context of the rising prevalence of slang and non-standard  ...  Acknowledgements We thank members of the UCSB NLP Lab and the anonymous reviewers for their valuable comments and suggestions.  ... 
arXiv:1804.02596v1 fatcat:yoe22t4mkja67a2xftipv3es6u

That's So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets

William Yang Wang, Diyi Yang
2015 Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing  
large lexical embeddings to create additional training instances significantly improves the lexical model; and incorporating frame-semantic embedding achieves the best overall performance.  ...  In quantitative analysis, we show that lexical and syntactic features are useful for automatic categorization of annoying behaviors, and frame-semantic features further boost the performance; that leveraging  ...  Embeddings trained with 51 million of words; 3) Urban Dictionary lexical embeddings trained with 53 million of words from slang definitions and examples; 4) Twitter Semantic Frame Embeddings trained with  ... 
doi:10.18653/v1/d15-1306 dblp:conf/emnlp/WangY15 fatcat:uebggqsymzelbcrboqv5azuz4a

Bi-ISCA: Bidirectional Inter-Sentence Contextual Attention Mechanism for Detecting Sarcasm in User Generated Noisy Short Text [article]

Prakamya Mishra, Saroj Kaushik, Kuntal Dey
2021 arXiv   pre-print
Bi-ISCA generates state-of-the-art results on two widely used benchmark datasets for the sarcasm detection task (Reddit and Twitter).  ...  The proposed deep learning model demonstrates the capability to capture explicit, implicit, and contextual incongruous words & phrases responsible for invoking sarcasm.  ...  So to solve this problem, these words are converted to their corresponding full-forms using abbreviation/slang word dictionaries obtained from urban dictionary 6 .  ... 
arXiv:2011.11465v3 fatcat:gi7oqqztg5ewveu53ptmnuveue
« Previous Showing results 1 — 15 out of 100 results