A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Towards non-toxic landscapes: Automatic toxic comment detection using DNN
[article]
2020
arXiv
pre-print
The spectacular expansion of the Internet has led to the development of a new research problem in the field of natural language processing: automatic toxic comment detection, since many countries prohibit ...
We compare different unsupervised word representations and different DNN based classifiers. ...
This article aims at designing methods for automatic toxic speech detection on the Internet. ...
arXiv:1911.08395v2
fatcat:t5wv4wpscfgqrh627ch27wfwdq
Transfer Learning for Hate Speech Detection in Social Media
[article]
2022
arXiv
pre-print
These methods and insights hold the potential for safer social media and reduce the need to expose human moderators and annotators to distressing online messaging. ...
This paper uses a transfer learning technique to leverage two independent datasets jointly and builds a single representation of hate speech. ...
The ELMo model is pre-trained for general purposes, and consequently, its constructed embeddings may have limited usefulness for the hate speech detection application. ...
arXiv:1906.03829v2
fatcat:3fp3z7ckgndl5h7aimxe3zhobu
Thai Spelling Correction and Word Normalization on Social Text using a Two-stage Pipeline with Neural Contextual Attention
2020
IEEE Access
., spell checkers) have been used to improve the quality of computerized text by detecting and correcting errors. ...
In this paper, we investigated how current text correction systems perform on correcting errors and word variances in Thai social texts and propose a method designed for this task. ...
ACKNOWLEDGMENTS This work was supported in part by the joint research of Kasikorn Business Technology Group (KBTG) and the Faculty of Engineering, Chulalongkorn University. ...
doi:10.1109/access.2020.3010828
fatcat:7mdoiniof5eahmm32w4beefmgq
Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling
2014
Journal of Advanced Research
We propose a fully generative model for the probability distribution of benign (white listed) domain names which can be used in an anomaly detection setting for identifying putative algorithmically generated ...
Since these names are mostly assigned by humans, they are pronounceable, and tend to have a distribution of characters, words, word lengths, and number of words that are typical of some language (mostly ...
We used the logarithm of the joint probability under this model as a test statistic for detection. ...
doi:10.1016/j.jare.2014.01.001
pmid:25685511
pmcid:PMC4294760
fatcat:lpxqtbssefgljiqexfphlaouj4
Codeword Detection, Focusing on Differences in Similar Words Between Two Corpora of Microblogs
2021
Annals of Emerging Technologies in Computing
We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection to evaluate the effectiveness of the method. ...
Recently, the use of microblogs in drug trafficking has surged and become a social problem. ...
Morphological analysis We focused on Twitter because of its use of short sentences, new words and slang, and limited character length. ...
doi:10.33166/aetic.2021.02.008
fatcat:t3kky4ifbjhrni2n72p7p4agju
Malicious Text Identification: Deep Learning from Public Comments and Emails
2020
Information
Identifying internet spam has been a challenging problem for decades. Several solutions have succeeded to detect spam comments in social media or fraudulent emails. ...
We designed a multi-label LSTM model and trained it on the joint datasets including text with common bigrams, extracted from each independent dataset. ...
The model contains 300-dimensional vectors for 3 million words and sentences that can be used to create word embeddings for a specific dataset. ...
doi:10.3390/info11060312
fatcat:3cbqddu4sfhe3ai2w5g7ejff24
A ML and NLP based Framework for Sentiment Analysis on Bigdata
2020
International journal of recent technology and engineering
In other words, social feedback on products and services are available. ...
Usage of probabilistic topic model is a novel approach in sentiment analysis. In this paper, we proposed a framework for comprehensive analysis of overall and aspect-based sentiments. ...
Three models are used for document vector generation. They are known as count model, TF-IDF model and word embeddings model. Word embeddings model is known as GoogleNews-vectors-negative300. ...
doi:10.35940/ijitee.d9062.029420
fatcat:nhyddtiqzradbpfhcgmiz5tg6m
LINGUISTIC ESSENCE OF COMPUTER AND INTERNET JARGONS
2020
Philology matters
The literary language and the language of the science and technology practically use the commonly-used words and scientific lexical units. ...
In the modern English and Uzbek languages jargons are widely used in terms of many concepts related to computer and the Internet activities. ...
The group of analyzed slang vocabulary consists of words that describe the process of working on the Internet: cobsite -an outdated, not updated site, spam -the names of types of advertising embedded in ...
doi:10.36078/987654465
fatcat:n3l2ttajczbbzb4xhucj447rvy
SocialNLP 2018 EmotionX Challenge Overview: Recognizing Emotions in Dialogues
2018
Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media
The best team achieves the unweighted accuracy 62.48 and 62.5 on EmotionPush and Friends, respectively. ...
Organizers provide baseline results. 18 teams registered in this challenge and 5 of them submitted their results successfully. ...
For the SmartDubai team, they use word and character TF-IDF independently with logistic regression. ...
doi:10.18653/v1/w18-3505
dblp:conf/acl-socialnlp/HsuK18
fatcat:vyfatxzycrbkhedmgm7cbrbqya
Authorship Attribution in Bangla literature using Character-level CNN
[article]
2020
arXiv
pre-print
The time and memory efficiency of the proposed model is much higher than the word level counterparts but accuracy is 2-5% less than the best performing word-level models. ...
Comparison of various word-based models is performed and shown that the proposed model performs increasingly better with larger datasets. ...
This concept can be leveraged to use character embeddings to fit misspelled words, rare or new words, slangs or emoticons. ...
arXiv:2001.05316v1
fatcat:ulmqi25ozjh6red5qgah4mlc6y
Text Analysis and Machine Learning Approach to Phished Email Detection
2019
International Journal of Computer Applications
(AI) that uses the method of data mining to find out new or existing characteristics from a set of gathered data which can be relevant for classification. ...
Machine learning methods has been found to achieve much better result than other phished email detection techniques such as blacklists, visual similarity and heuristic techniques. ...
vector created using word embedding discussed in section 3.3 was used for the training and testing of the classifiers using 10-fold cross validation. ...
doi:10.5120/ijca2019918354
fatcat:siwkv5n5izb7vco3dam57kvaie
Characterization of citizens using word2vec and latent topic analysis in a large set of tweets
2019
Cities
With the increasing use of the Internet and mobile devices, social networks are becoming the most used media to communicate citizens' ideas and thoughts. ...
Results show that the proposed method is an interesting tool to characterize a city population based on a machine learning methods and text analytics. ...
We selected this model because it has been the seed for all word embedding models, and it is the most widely used model, despite the existence of newer and very successful word embedding models such as ...
doi:10.1016/j.cities.2019.03.019
fatcat:2z4rzz32jrdn7lxuu3wd4ilxri
Aspect-Based Sentiment Analysis Using Hybrid CNN-SVM with Particle Swarm Optimization for Domain Independent Datasets
2020
International Journal of Emerging Trends in Engineering Research
In this paper, we suggested novel intelligent framework based on hybrid convolutional neural network and support vector machine (SVM) for aspect-based sentiment detection and classification of online product ...
However, building a powerful hybrid aspect-based sentiment analysis model utilizing CNN can be highly complex and expensive. ...
Sentence level embedding A sentence x with m words is provided {w 1 ;w 2 ;...w m } which is then translated in to joints of word level and {u 1 , u 2 ;…,u n } are embedding level character. ...
doi:10.30534/ijeter/2020/628102020
fatcat:bixgm5d7fvbqraizwbc73a3q3q
Improving Adverse Drug Event Extraction with SpanBERT on Different Text Typologies
[article]
2021
arXiv
pre-print
In recent years, Internet users are reporting Adverse Drug Events (ADE) on social media, blogs and health forums. ...
We propose for the first time the use of the SpanBERT architecture for the task of ADE extraction: this new version of the popular BERT transformer showed improved capabilities with multi-token text spans ...
In addition, in short and highly contextual language, such as the one used in social media -which is characterized by acronyms, slang, metaphors, etc. ...
arXiv:2105.08882v1
fatcat:gytfiem6u5dbdolqstu7wckcqq
Improving Named Entity Recognition in Tweets via Detecting Non-Standard Words
2015
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Second, this paper investigates two methods using NSW detection results for named entity recognition (NER) in social media data. ...
One adopts a pipeline strategy, and the other uses a joint decoding fashion. We also create a new data set with newly added normalization annotation beyond the existing named entity labels. ...
Acknowledgments We thank the anonymous reviewers for their detailed and insightful comments on earlier drafts of this paper. The work is partially supported by DARPA Contract No. FA8750-13-2-0041. ...
doi:10.3115/v1/p15-1090
dblp:conf/acl/LiL15
fatcat:ycelfphrwzdhhlr235tfigonfu
« Previous
Showing results 1 — 15 out of 751 results