Filters








15,843 Hits in 7.7 sec

Author identification: Using text sampling to handle the class imbalance problem

Efstathios Stamatatos
2008 Information Processing & Management  
Authorship analysis of electronic texts assists digital forensics and anti-terror investigation. Author identification can be seen as a single-label multi-class text categorization problem.  ...  The main idea is to segment the training texts into text samples according to the size of the class, thus producing a fairer classification model.  ...  Evaluation results In order to evaluate the performance of a method handling the class imbalance problem we need a baseline.  ... 
doi:10.1016/j.ipm.2007.05.012 fatcat:uv62jr5gingkjo6dc76qwxpwk4

Author Identification in Imbalanced Sets of Source Code Samples

E. Chatzicharalampous, G. Frantzeskou, E. Stamatatos
2012 2012 IEEE 24th International Conference on Tools with Artificial Intelligence  
Although very promising results have been reported for this task, the evaluation of existing approaches avoids focusing on the class imbalance problem and its effect on the performance.  ...  Source code author identification can be viewed as a text classification task given that samples of known authorship by a set of candidate authors are available.  ...  Previous studies on source code author identification have only superficially dealt with the class imbalance problem.  ... 
doi:10.1109/ictai.2012.112 dblp:conf/ictai/ChatzicharalampousFS12 fatcat:gzyd3qumfrfnpj6fuj2u2hebem

Author Identification Using Semi-supervised Learning - Notebook for PAN at CLEF 2011

Ioannis Kourtis, Efstathios Stamatatos
2011 Conference and Labs of the Evaluation Forum  
Author identification models fall into two major categories according to the way they handle the training texts: profile-based models produce one representation per author while instance-based models produce  ...  The evaluation results on closed-set author identification are encouraging, especially when the set of candidate authors is large.  ...  This may happen when there are very limited training texts for one candidate author (i.e., the class imbalance problem).  ... 
dblp:conf/clef/KourtisS11 fatcat:jqp4za4nibbxjbtxs4dsi3q4he

Author identification in bibliographic data using deep neural networks

Firdaus Firdaus, Siti Nurmaini, Reza Firsandaya Malik, Annisa Darmawahyuni, Muhammad Naufal Rachmatullah, Andre Herviant Juliano, Tio Artha Nugraha, Varindo Ockta Keneddi Putra
2021 TELKOMNIKA (Telecommunication Computing Electronics and Control)  
A constructive approach for resolving name ambiguity is to use computer algorithms to identify author names.  ...  The raw data is grouped into four classes, i.e., synonyms, homonyms, homonymssynonyms, and non-homonyms-synonyms classification.  ...  ACKNOWLEDGEMENTS We thank the Ministry of Research, Technology, and Higher Education, Republic of Indonesia (Kemenristekdikti RI), for funding the research on "Penelitian Disertasi Doktor" Research Grant  ... 
doi:10.12928/telkomnika.v19i3.18877 fatcat:klasur3jtzcfzmewypo4kqxcdm

Influence of features discretization on accuracy of random forest classifier for web user identification

Alisa A. Vorobeva
2017 2017 20th Conference of Open Innovations Association (FRUCT)  
Was used data sets with various level of class imbalance and amount of training texts per user.  ...  To evaluate the influence were carried out series of experiments on text corpus, contains Russian online texts of different genres and topics.  ...  To simulate the real-world situation was used data sets with various level of class imbalance and amount of training texts per user.  ... 
doi:10.23919/fruct.2017.8071354 dblp:conf/fruct/Vorobeva17 fatcat:t7rjyifqejdolf5doeucs7j6ra

DLRG@HASOC 2020: A Hybrid Approach for Hate and Offensive Content Identification in Multilingual Tweets

Yashwanth Reddy B., Ratnavel Rajalakshmi
2020 Forum for Information Retrieval Evaluation  
To address the problem of class imbalance, we have combined a over sampling technique with a suitable feature weighting method.  ...  To handle these problems, automated methods are necessary that can help to analyse the social media posts and to identify the hate speech.  ...  The authors would like to thank the management of Vellore Institute of Technology, Chennai for providing the support to carry out this work.  ... 
dblp:conf/fire/BR20 fatcat:nfdvwrresnaddkpepvglzioh3q

A Spammer Identification Method for Class Imbalanced Weibo Datasets

Wenbing Tang, Zuohua Ding, Mengchu Zhou
2019 IEEE Access  
Considering the existence of imbalance problems in spammer identification, an ensemble learning method is used to combine multiple base classifiers for improving the learning performance.  ...  However, most of the previous studies overlook the class imbalance problem of real-world data.  ...  [8] first investigated the class imbalance problem in the domain of Twitter spammers detection, and then combine FOS with ensemble learning to handle the class imbalance problem in Twitter datasets.  ... 
doi:10.1109/access.2019.2901756 fatcat:afsn4epixrdvnliltjkajevtoi

Expert System for the Identification of Review Papers Using Ensemble Learning

Ghulam Mustafa
2021 Pakistan Social Sciences Review  
The presented algorithm provides an efficient alternative to existing algorithms, by combines the strengths of Multiboost ensemble with the sampling technique.  ...  The objective of this work was to apply machine learning technique to automatically identify review articles given the imbalance representation of publications types in publications.  ...  Two key techniques to handle the class imbalance problem are data sampling and ensemble learning (Batista et al.,2004) .  ... 
doi:10.35484/pssr.2021(5-i)38 fatcat:f5michv2ibhx3gh6qwrz5prhve

Enhancing the Identification of Cyberbullying through Participant Roles [article]

Gathika Ratnayaka, Thushari Atapattu, Mahen Herath, Georgia Zhang, Katrina Falkner
2020 arXiv   pre-print
We utilise a dataset from ASKfm to perform multi-class classification to detect participant roles (e.g. victim, harasser).  ...  Cyberbullying is a prevalent social problem that inflicts detrimental consequences to the health and safety of victims such as psychological distress, anti-social behaviour, and suicide.  ...  Acknowledgments Authors would like to acknowledge the researchers on the AMiCA project for sharing the dataset.  ... 
arXiv:2010.06640v2 fatcat:3fndeuvydjbfxkjiciuwtqu4hm

Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts [article]

Charangan Vasantharajan, Uthayasanker Thayasivam
2021 arXiv   pre-print
In non-native English spoken countries, social media users mostly use a code-mixed form of text in their posts/comments.  ...  The current study presents extensive experiments using multiple deep learning, and transfer learning models to detect offensive content on YouTube.  ...  Acknowledgements We would like to express our thanks to Mr.Sanjeepan Sivapiran 11 and Mr.Temcious Fernando 12 for their helpful suggestions to improve and clarify this manuscript.  ... 
arXiv:2108.10939v2 fatcat:i5cydkgna5cuvijqqspjbv2ewe

Structure-based identification of catalytic residues

Ran Yahalom, Dan Reshef, Ayana Wiener, Sagiv Frankel, Nir Kalisman, Boaz Lerner, Chen Keasar
2011 Proteins: Structure, Function, and Bioinformatics  
ACKNOWLEDGMENTS The authors thank El-ad David Amir for the helpful discussion and insightful comments. The authors are grateful to Ms. Rise Silverman and Ms.  ...  Edna Oxman for carefully editing the manuscript.  ...  Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues.  ... 
doi:10.1002/prot.23020 pmid:21491495 pmcid:PMC3092797 fatcat:ub6lfl4xkbea7co3uem3bxdhbq

Pretrained Transformers for Offensive Language Identification in Tanglish [article]

Sean Benhur, Kanchana Sivanraju
2021 arXiv   pre-print
After the task deadline, we sampled the dataset uniformly and used the MuRIL pretrained model, which helped us achieve a weighted average score of 0.67, the top score in the leaderboard.  ...  This paper describes the system submitted to Dravidian-Codemix-HASOC2021: Hate Speech and Offensive Language Identification in Dravidian Languages (Tamil-English and Malayalam-English).  ...  After the task deadline, we sample the dataset uniformly to handle the class imbalance problem in this dataset, which helps us improve our score.  ... 
arXiv:2110.02852v4 fatcat:tsuszyi2grea5lqfflrn5ovadi

Author Profiling in Social Media with Multimodal Information

Miguel Á. Álvarez Carmona, Esaú Villatoro Tello, Manuel Montes y Gómez, Luis Villaseñor Pineda
2020 Journal of Computacion y Sistemas  
Determine aspects of a person as gender, age, residency, occupation, among others, through his/her texts is a task that is part of the natural language processing and is known as author profiling.  ...  In this thesis work, we propose a solution for the task of profiling authors in social networks.  ...  algorithm is to produce a function to map a sample from the attribute space to a class label, i.e., f (x) : x → c ∈ C, where x is the sample and C is the set of class labels.  ... 
doi:10.13053/cys-24-3-3488 fatcat:dj6rhmdbf5grhngea6hwocxo7q

Emotion Identification in Movies through Facial Expression Recognition

João Almeida, Luís Vilaça, Inês N. Teixeira, Paula Viana
2021 Applied Sciences  
To cope with the problem of lack of datasets for the scope under analysis, we demonstrated the feasibility of using a generic dataset for the training process and propose a new way to look at emotions  ...  We presented a comprehensive overview of the most relevant datasets used for FER, highlighting problems caused by their heterogeneity and to the inexistence of a universal model of emotions.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/app11156827 fatcat:gbg5bntpbrfmldmrpiqkpdz5fy

ICLSTM: Encrypted Traffic Service Identification Based on Inception-LSTM Neural Network

Bei Lu, Nurbol Luktarhan, Chao Ding, Wenhui Zhang
2021 Symmetry  
To alleviate the problem of category imbalance, different weight parameters are set for each category separately in the training phase to make it more symmetrical for different categories of encrypted  ...  This method converts traffic data into common gray images, and then uses the constructed ICLSTM neural network to extract key features and perform effective traffic classification.  ...  For the class imbalance problem, we consider assigning different weights to different classes separately to reduce the impact of class imbalance on the classification effect.  ... 
doi:10.3390/sym13061080 fatcat:z5kubpmdsfhghgctg2neg6lkl4
« Previous Showing results 1 — 15 out of 15,843 results