A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2014; you can also visit the original URL.
The file type is application/pdf
.
Filters
Author identification: Using text sampling to handle the class imbalance problem
2008
Information Processing & Management
Authorship analysis of electronic texts assists digital forensics and anti-terror investigation. Author identification can be seen as a single-label multi-class text categorization problem. ...
The main idea is to segment the training texts into text samples according to the size of the class, thus producing a fairer classification model. ...
Evaluation results In order to evaluate the performance of a method handling the class imbalance problem we need a baseline. ...
doi:10.1016/j.ipm.2007.05.012
fatcat:uv62jr5gingkjo6dc76qwxpwk4
Author Identification in Imbalanced Sets of Source Code Samples
2012
2012 IEEE 24th International Conference on Tools with Artificial Intelligence
Although very promising results have been reported for this task, the evaluation of existing approaches avoids focusing on the class imbalance problem and its effect on the performance. ...
Source code author identification can be viewed as a text classification task given that samples of known authorship by a set of candidate authors are available. ...
Previous studies on source code author identification have only superficially dealt with the class imbalance problem. ...
doi:10.1109/ictai.2012.112
dblp:conf/ictai/ChatzicharalampousFS12
fatcat:gzyd3qumfrfnpj6fuj2u2hebem
Author Identification Using Semi-supervised Learning - Notebook for PAN at CLEF 2011
2011
Conference and Labs of the Evaluation Forum
Author identification models fall into two major categories according to the way they handle the training texts: profile-based models produce one representation per author while instance-based models produce ...
The evaluation results on closed-set author identification are encouraging, especially when the set of candidate authors is large. ...
This may happen when there are very limited training texts for one candidate author (i.e., the class imbalance problem). ...
dblp:conf/clef/KourtisS11
fatcat:jqp4za4nibbxjbtxs4dsi3q4he
Author identification in bibliographic data using deep neural networks
2021
TELKOMNIKA (Telecommunication Computing Electronics and Control)
A constructive approach for resolving name ambiguity is to use computer algorithms to identify author names. ...
The raw data is grouped into four classes, i.e., synonyms, homonyms, homonymssynonyms, and non-homonyms-synonyms classification. ...
ACKNOWLEDGEMENTS We thank the Ministry of Research, Technology, and Higher Education, Republic of Indonesia (Kemenristekdikti RI), for funding the research on "Penelitian Disertasi Doktor" Research Grant ...
doi:10.12928/telkomnika.v19i3.18877
fatcat:klasur3jtzcfzmewypo4kqxcdm
Influence of features discretization on accuracy of random forest classifier for web user identification
2017
2017 20th Conference of Open Innovations Association (FRUCT)
Was used data sets with various level of class imbalance and amount of training texts per user. ...
To evaluate the influence were carried out series of experiments on text corpus, contains Russian online texts of different genres and topics. ...
To simulate the real-world situation was used data sets with various level of class imbalance and amount of training texts per user. ...
doi:10.23919/fruct.2017.8071354
dblp:conf/fruct/Vorobeva17
fatcat:t7rjyifqejdolf5doeucs7j6ra
DLRG@HASOC 2020: A Hybrid Approach for Hate and Offensive Content Identification in Multilingual Tweets
2020
Forum for Information Retrieval Evaluation
To address the problem of class imbalance, we have combined a over sampling technique with a suitable feature weighting method. ...
To handle these problems, automated methods are necessary that can help to analyse the social media posts and to identify the hate speech. ...
The authors would like to thank the management of Vellore Institute of Technology, Chennai for providing the support to carry out this work. ...
dblp:conf/fire/BR20
fatcat:nfdvwrresnaddkpepvglzioh3q
A Spammer Identification Method for Class Imbalanced Weibo Datasets
2019
IEEE Access
Considering the existence of imbalance problems in spammer identification, an ensemble learning method is used to combine multiple base classifiers for improving the learning performance. ...
However, most of the previous studies overlook the class imbalance problem of real-world data. ...
[8] first investigated the class imbalance problem in the domain of Twitter spammers detection, and then combine FOS with ensemble learning to handle the class imbalance problem in Twitter datasets. ...
doi:10.1109/access.2019.2901756
fatcat:afsn4epixrdvnliltjkajevtoi
Expert System for the Identification of Review Papers Using Ensemble Learning
2021
Pakistan Social Sciences Review
The presented algorithm provides an efficient alternative to existing algorithms, by combines the strengths of Multiboost ensemble with the sampling technique. ...
The objective of this work was to apply machine learning technique to automatically identify review articles given the imbalance representation of publications types in publications. ...
Two key techniques to handle the class imbalance problem are data sampling and ensemble learning (Batista et al.,2004) . ...
doi:10.35484/pssr.2021(5-i)38
fatcat:f5michv2ibhx3gh6qwrz5prhve
Enhancing the Identification of Cyberbullying through Participant Roles
[article]
2020
arXiv
pre-print
We utilise a dataset from ASKfm to perform multi-class classification to detect participant roles (e.g. victim, harasser). ...
Cyberbullying is a prevalent social problem that inflicts detrimental consequences to the health and safety of victims such as psychological distress, anti-social behaviour, and suicide. ...
Acknowledgments Authors would like to acknowledge the researchers on the AMiCA project for sharing the dataset. ...
arXiv:2010.06640v2
fatcat:3fndeuvydjbfxkjiciuwtqu4hm
Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts
[article]
2021
arXiv
pre-print
In non-native English spoken countries, social media users mostly use a code-mixed form of text in their posts/comments. ...
The current study presents extensive experiments using multiple deep learning, and transfer learning models to detect offensive content on YouTube. ...
Acknowledgements We would like to express our thanks to Mr.Sanjeepan Sivapiran 11 and Mr.Temcious Fernando 12 for their helpful suggestions to improve and clarify this manuscript. ...
arXiv:2108.10939v2
fatcat:i5cydkgna5cuvijqqspjbv2ewe
Structure-based identification of catalytic residues
2011
Proteins: Structure, Function, and Bioinformatics
ACKNOWLEDGMENTS The authors thank El-ad David Amir for the helpful discussion and insightful comments. The authors are grateful to Ms. Rise Silverman and Ms. ...
Edna Oxman for carefully editing the manuscript. ...
Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. ...
doi:10.1002/prot.23020
pmid:21491495
pmcid:PMC3092797
fatcat:ub6lfl4xkbea7co3uem3bxdhbq
Pretrained Transformers for Offensive Language Identification in Tanglish
[article]
2021
arXiv
pre-print
After the task deadline, we sampled the dataset uniformly and used the MuRIL pretrained model, which helped us achieve a weighted average score of 0.67, the top score in the leaderboard. ...
This paper describes the system submitted to Dravidian-Codemix-HASOC2021: Hate Speech and Offensive Language Identification in Dravidian Languages (Tamil-English and Malayalam-English). ...
After the task deadline, we
sample the dataset uniformly to handle the class imbalance problem in this dataset, which helps
us improve our score. ...
arXiv:2110.02852v4
fatcat:tsuszyi2grea5lqfflrn5ovadi
Author Profiling in Social Media with Multimodal Information
2020
Journal of Computacion y Sistemas
Determine aspects of a person as gender, age, residency, occupation, among others, through his/her texts is a task that is part of the natural language processing and is known as author profiling. ...
In this thesis work, we propose a solution for the task of profiling authors in social networks. ...
algorithm is to produce a function to map a sample from the attribute space to a class label, i.e., f (x) : x → c ∈ C, where x is the sample and C is the set of class labels. ...
doi:10.13053/cys-24-3-3488
fatcat:dj6rhmdbf5grhngea6hwocxo7q
Emotion Identification in Movies through Facial Expression Recognition
2021
Applied Sciences
To cope with the problem of lack of datasets for the scope under analysis, we demonstrated the feasibility of using a generic dataset for the training process and propose a new way to look at emotions ...
We presented a comprehensive overview of the most relevant datasets used for FER, highlighting problems caused by their heterogeneity and to the inexistence of a universal model of emotions. ...
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/app11156827
fatcat:gbg5bntpbrfmldmrpiqkpdz5fy
ICLSTM: Encrypted Traffic Service Identification Based on Inception-LSTM Neural Network
2021
Symmetry
To alleviate the problem of category imbalance, different weight parameters are set for each category separately in the training phase to make it more symmetrical for different categories of encrypted ...
This method converts traffic data into common gray images, and then uses the constructed ICLSTM neural network to extract key features and perform effective traffic classification. ...
For the class imbalance problem, we consider assigning different weights to different classes separately to reduce the impact of class imbalance on the classification effect. ...
doi:10.3390/sym13061080
fatcat:z5kubpmdsfhghgctg2neg6lkl4
« Previous
Showing results 1 — 15 out of 15,843 results