Filters








14,717 Hits in 3.4 sec

N-Gram Representations For Comment Filtering

Dirk Brand, Steve Kroon, Brink van der Merwe, Loek Cleophas
2015 Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists - SAICSIT '15  
We find that the N-gram representations greatly outperform manual feature extraction techniques.  ...  Accurate classifiers for short texts are valuable assets in many applications.  ...  For character N -grams, all the N -grams for values of 2 ≤ N ≤ 8 are included in a single representation (denoted by "C28").  ... 
doi:10.1145/2815782.2815789 dblp:conf/saicsit/BrandKMC15 fatcat:zuldiokcizeunbpggzepdpmkb4

Emoji Identification and Prediction in Hebrew Political Corpus

2019 Issues in Informing Science and Information Technology  
We compare two text representation approaches, i.e., n-grams and character n-grams and analyze the contribution of additional metadata features to the classification.  ...  Recommendations for Practitioners: In many of the cases the classifier decision seems fitter to the comment content than the emoji that was chosen by the commentator.  ...  the n-grams and character n-grams representations for the emoji identification task.  ... 
doi:10.28945/4372 fatcat:l3f4l65iezgxfpf6vjpfsic4yu

Sentiment Analysis for Software Engineering Domain in Turkish

Mansur Alp TOÇOĞLU
2020 Sakarya University Journal of Computer and Information Sciences  
In the experimental analysis, first we focus on achieving classification results by using three conventional text representation schemes and three N-gram models in conjunction with five classifiers (i.e  ...  The focus of this study is to provide a model to be used for the identification of sentiments of comments about education and profession life of software engineering in social media and microblogging sites  ...  Feature Extraction Schemes N-gram modelling is a popular feature representation scheme for language modelling and natural language processing tasks.  ... 
doi:10.35377/saucis.03.03.769969 fatcat:gbkk4hz4uvfmhb65sjc4xszaqq

An approach to spam comment detection through domain-independent features

Jong Myoung Kim, Zae Myung Kim, Kwangjo Kim
2016 2016 International Conference on Big Data and Smart Computing (BigComp)  
To evaluate the first measure, experiments on detecting blog-spam comments are conducted. As for the second measure, we employ SVM on the ID space of e-mail data collected by "Apache Spam Assassin".  ...  Nowadays, these commercially oriented spams are well detected; the real challenge lies in filtering rather vague spams that do not exhibit distinctive spam keywords.  ...  N-gram for feature representation Most spam classifications use words or phrases as a feature.  ... 
doi:10.1109/bigcomp.2016.7425926 dblp:conf/bigcomp/KimKK16 fatcat:5rhdx4i2ubg5faxhzg6kd3jaia

Detecting spam comments on Indonesia's Instagram posts

Ali Akbar Septiandri, Okiriza Wibisono
2017 Journal of Physics, Conference Series  
fastText and our proposed basic features and keyword patterns turned out to be the best models to identify spam comments.  ...  In more recent work [7] , we can see the improved version of skip-gram model [8] , "where each word is represented as a bag-of-character n-grams."  ...  Methodology Features We used several techniques for representing the comments as follows: (i) Binary Bag-of-Words with LSA; (ii) TF-IDF with LSA; (iii) Word2Vec using skip-gram model.  ... 
doi:10.1088/1742-6596/801/1/012069 fatcat:b3vwdqbrvbdltleecyokiy2np4

Molding CNNs for text: non-linear, non-consecutive convolutions [article]

Tao Lei, Regina Barzilay, Tommi Jaakkola
2015 arXiv   pre-print
Instead of concatenating word representations, we appeal to tensor algebra and use low-rank n-gram tensors to directly exploit interactions between words already at the convolution stage.  ...  Moreover, we extend the n-gram convolution to non-consecutive words to recognize patterns with intervening words.  ...  Acknowledgments We thank Kai Sheng Tai, Mohit Iyyer and Jordan Boyd-Graber for answering questions about their paper. We also thank Yoon Kim, the MIT NLP group and the reviewers for their comments.  ... 
arXiv:1508.04112v2 fatcat:t472virw2jceblpdv5zr7wwa6y

Molding CNNs for text: non-linear, non-consecutive convolutions

Tao Lei, Regina Barzilay, Tommi Jaakkola
2015 Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing  
Instead of concatenating word representations, we appeal to tensor algebra and use low-rank n-gram tensors to directly exploit interactions between words already at the convolution stage.  ...  Moreover, we extend the n-gram convolution to non-consecutive words to recognize patterns with intervening words.  ...  Acknowledgments We thank Kai Sheng Tai, Mohit Iyyer and Jordan Boyd-Graber for answering questions about their paper. We also thank Yoon Kim, the MIT NLP group and the reviewers for their comments.  ... 
doi:10.18653/v1/d15-1180 dblp:conf/emnlp/LeiBJ15 fatcat:hdjayrzlzzcl3gv4wit6kbt36a

Insult detection using a partitional CNN-LSTM model

Mohamed Maher Ben Ismail
2020 Computer Science and Information Technologies  
The resulting local information is then sequentially exploited across partitions using LSTM for verbal offense detection.  ...  The combination of the partitional CNN and LSTM yields the integration of the local within comments information and the long distance correlation across comments.  ...  As illustrated in Figure 1 As one can notice, C convolutional filters are used for each partition to extract the local n-gram features.  ... 
doi:10.11591/csit.v1i2.p84-92 fatcat:dwnafzpevbhypbkh5kbamdmoke

Aesthetic Image Captioning From Weakly-Labelled Photographs [article]

Koustav Ghosal, Aakanksha Rana, Aljosa Smolic
2019 arXiv   pre-print
We propose a probabilistic caption-filtering method for cleaning the noisy web-data, and compile a large-scale, clean dataset "AVA-Captions", (230, 000 images with 5 captions per image).  ...  The strategy is weakly supervised and can be effectively used to learn rich aesthetic representations, without requiring expensive ground-truth annotations.  ...  In this work, we propose to clean the raw captions from AVA by proposing a probabilistic n-gram based filtering strategy.  ... 
arXiv:1908.11310v1 fatcat:6stzxiowb5dmheme4yrhnpdvku

Aesthetic Image Captioning From Weakly-Labelled Photographs

Koustav Ghosal, Aakanksha Rana, Aljosa Smolic
2019 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)  
We propose a probabilistic caption-filtering method for cleaning the noisy web-data, and compile a large-scale, clean dataset 'AVA-Captions', ( ∼ 230, 000 images with ∼ 5 captions per image).  ...  The strategy is weakly supervised and can be effectively used to learn rich aesthetic representations, without requiring expensive ground-truth annotations.  ...  In this work, we propose to clean the raw captions from AVA by proposing a probabilistic n-gram based filtering strategy.  ... 
doi:10.1109/iccvw.2019.00556 dblp:conf/iccvw/GhosalRS19 fatcat:w243egxxo5d5nkbdw7k7sjlsay

Utilizing FastText for Venue Recommendation [article]

Makbule Gulcin Ozsoy
2020 arXiv   pre-print
Traditional recommendation systems employ collaborative filtering, content-based filtering or matrix factorization.  ...  Recently, vector space embedding and deep learning algorithms are also used for recommendation.  ...  length of character ngrams (max n) (where size = 100) Fig. 4: Performance results of Skip-gram (Seq-Single-Max) for different vector sizes (size) and for different length of character n-grams (max n)  ... 
arXiv:2005.12982v1 fatcat:hgrva6g25nhqfptbsihnjvx7oa

Automatic categorisation of comments in social news websites

Igor Santos, Jorge de-la-Peña-Sordo, Iker Pastor-López, Patxi Galán-García, Pablo G. Bringas
2012 Expert systems with applications  
type of information contained in the comment and the controversy level of the comment.  ...  In particular, social news sites link stories and the different users can comment them.  ...  Second, we have used a n-gram approach as terms.  ... 
doi:10.1016/j.eswa.2012.05.061 fatcat:4aqtnqvxljdlbjhocmupvs4gem

A Large Self-Annotated Corpus for Sarcasm [article]

Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli
2018 arXiv   pre-print
We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for sarcasm research and for training and evaluating systems for sarcasm detection.  ...  We evaluate the corpus for accuracy, construct benchmarks for sarcasm detection, and evaluate baseline methods.  ...  Bag-of-n-Grams The Bag-of-n-Grams representation consists of using a document's n-gram counts as features in a vector. We test two variants, the Bag-of-Words and the Bag-of-Bigrams.  ... 
arXiv:1704.05579v4 fatcat:2nvi2pwqbzerljyqxbtz5fwd7m

Neural Character-based Composition Models for Abuse Detection [article]

Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova
2018 arXiv   pre-print
In this paper, we address this problem by designing a model that can compose embeddings for unseen words.  ...  However, in using a single embedding for all unseen words we lose the ability to distinguish between obfuscated and non-obfuscated or rare words.  ...  Acknowledgements Special thanks to the anonymous reviewers for their valuable comments and suggestions.  ... 
arXiv:1809.00378v1 fatcat:eetftlviw5hk7myihjixevgbha

Neural Character-based Composition Models for Abuse Detection

Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova
2018 Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)  
Acknowledgements Special thanks to the anonymous reviewers for their valuable comments and suggestions.  ...  This is expected since words are not very long sequences, and the filters of the CNN are able to capture the different character n-grams within them.  ...  representations, followed by an LR layer to classify the comments based on those representations.  ... 
doi:10.18653/v1/w18-5101 dblp:conf/acl-alw/MishraYS18 fatcat:y7mbghaq6japjkuojozdhwhtjm
« Previous Showing results 1 — 15 out of 14,717 results