Filters








66,267 Hits in 5.0 sec

Author Identification in Imbalanced Sets of Source Code Samples

E. Chatzicharalampous, G. Frantzeskou, E. Stamatatos
2012 2012 IEEE 24th International Conference on Tools with Artificial Intelligence  
Although very promising results have been reported for this task, the evaluation of existing approaches avoids focusing on the class imbalance problem and its effect on the performance.  ...  In this paper, we present a systematic experimental study of author identification in skewed training sets where the training samples are unequally distributed over the candidate authors.  ...  CONCLUSIONS In this paper, we presented a detailed experimental study of the class imbalance problem in the source code author identification task.  ... 
doi:10.1109/ictai.2012.112 dblp:conf/ictai/ChatzicharalampousFS12 fatcat:gzyd3qumfrfnpj6fuj2u2hebem

Author identification: Using text sampling to handle the class imbalance problem

Efstathios Stamatatos
2008 Information Processing & Management  
Author identification can be seen as a single-label multi-class text categorization problem.  ...  Based on two text corpora of two languages, namely, newswire stories in English and newspaper reportage in Arabic, we present a series of authorship identification experiments on various multi-class imbalanced  ...  CONCLUSIONS Many text categorization tasks, including authorship identification, suffer from the class imbalance problem.  ... 
doi:10.1016/j.ipm.2007.05.012 fatcat:uv62jr5gingkjo6dc76qwxpwk4

Influence of features discretization on accuracy of random forest classifier for web user identification

Alisa A. Vorobeva
2017 2017 20th Conference of Open Innovations Association (FRUCT)  
Was used data sets with various level of class imbalance and amount of training texts per user.  ...  The experiments showed that the discretization of features improves the accuracy of identification for all data sets.  ...  Several works focus on important problems in web author identification: how number of authors influence on identification accuracy?  ... 
doi:10.23919/fruct.2017.8071354 dblp:conf/fruct/Vorobeva17 fatcat:t7rjyifqejdolf5doeucs7j6ra

A Spammer Identification Method for Class Imbalanced Weibo Datasets

Wenbing Tang, Zuohua Ding, Mengchu Zhou
2019 IEEE Access  
However, most of the previous studies overlook the class imbalance problem of real-world data.  ...  INDEX TERMS Class imbalance problem, cost-sensitive SVM, ensemble learning, fuzzy-based oversampling, spammer identification.  ...  [8] first investigated the class imbalance problem in the domain of Twitter spammers detection, and then combine FOS with ensemble learning to handle the class imbalance problem in Twitter datasets.  ... 
doi:10.1109/access.2019.2901756 fatcat:afsn4epixrdvnliltjkajevtoi

Machine Learning and Class Imbalance: A Literature Survey

Swati Narwane, Sudhir Sawarkar
2019 Industrial Engineering Journal  
The collected papers on class imbalance problem for ML were 4 major categories like binary class imbalance, multi-class imbalance, binary and multi-class imbalance, and rare events class imbalance.  ...  The survey focused on, various issues in class imbalance for ML. The purpose of the present paper is to help the scholars and readers in understanding the impact of the class imbalance for ML.  ...  In the world of Machine Learning (ML) binary class imbalance problem is a relative problem and it is based on the degree of class imbalance, the size of training data, the complexity of the data sets and  ... 
doi:10.26488/iej.12.10.1202 fatcat:42e5bucilfeadlsnqkjbjsy2fe

Authorship Attribution on Imbalanced English Editorial Corpora

O. Srinivasa, N. V., V. Vijaya
2017 International Journal of Computer Applications  
Authorship attribution is one of the important problem, with many applications of practical use in the real-world.  ...  Authorship identification determines the likelihood of a piece of writing produced by a particular author by examining the other writings of that author.  ...  The editorial documents written by various authors may not have same length, hence leads to class imbalance problem.  ... 
doi:10.5120/ijca2017914587 fatcat:hchhscne7ja3xczpss5tliimau

A Fast Imbalanced Binary Classification Approach to NLOS Identification in UWB Positioning

Bo Song, Sheng-Lin Li, Mian Tan, Qing-Hui Ren
2018 Mathematical Problems in Engineering  
However, in reality, the number of LOS signals in UWB positioning is much larger than the NLOS signals. So the samples are characterized by class-imbalance.  ...  This method does not depend on the number of LOS signals and is suitable for dealing with the problem of classification of the imbalance between the number of LOS and NLOS signals.  ...  Acknowledgments The authors would like to thank Shiwei Tian for providing the dataset of UWB waveforms.  ... 
doi:10.1155/2018/1580147 fatcat:fvi77dscj5et5gxhv3hlayqea4

Improving Imbalanced Question Classification Using Structured Smote Based Approach

ALAA MOHASSEB, MOHAMED BADER-El-DEN, MIHAELA COCEA, HAN LIU
2018 2018 International Conference on Machine Learning and Cybernetics (ICMLC)  
However, as in many real-world classification problems, QC may suffer from the problem of class imbalance.  ...  In this paper, we propose a framework that deals with the class imbalance using a hierarchical SMOTE algorithm for balancing different types of questions.  ...  However, as in many real-world classification problems, QC may suffer from the problem of class imbalance [1] .  ... 
doi:10.1109/icmlc.2018.8527028 dblp:conf/icmlc/MohassebBCL18 fatcat:pf3mq3w2rrhnjpv2berpna6fia

A Systematic Methodology on Class Imbalanced Problems involved in the Classification of Real-World Datasets

2019 International journal of recent technology and engineering  
It is observed from the past research studies most of the imbalanced data sets consists of the major classes and minor classes and the major class leads the minor class.  ...  Several standards and hybrid prediction algorithms are proposed in various application domains but in most of the real-time data sets analyzed in the studies are imbalanced by nature thereby affecting  ...  VARIOUS TECHNIQUES IN THE CLASSIFICATION OF DATA INTRINSIC AND CLASS IMBALANCE PROBLEMS The class imbalance problems in selected literature studies are broadly addressed in two different levels that include  ... 
doi:10.35940/ijrte.c5756.098319 fatcat:va4v2lkpynayzab4wn4kzwgwhu

Addressing the Classification with Imbalanced Data: Open Problems and New Challenges on Class Distribution [chapter]

A. Fernández, S. García, F. Herrera
2011 Lecture Notes in Computer Science  
Classifier learning with datasets which suffer from imbalanced class distributions is an important problem in data mining.  ...  Classifier learning with datasets which suffer from imbalanced class distributions is an important problem in data mining.  ...  Acknowledgment This work had been supported by the Spanish Ministry of Science and Technology under Project TIN2008-06681-C06-01.  ... 
doi:10.1007/978-3-642-21219-2_1 fatcat:ni4ri4aaavf4rbwjtmvxqary4u

TWO-PHASE STACKING ENSEMBLE TO EFFECTIVELY HANDLE DATA IMBALANCES IN CLASSIFICATION PROBLEMS

K. Madasamy
2018 International Journal of Advanced Research in Computer Science  
One of the major challenges in processing real-time data is to handle the implicit data imbalance.  ...  The proposed model utilizes multiple classifier algorithms in the first phase to predict data. The predicted data is used as input for the second phase.  ...  Another mode of dealing with the class imbalance problem is to apply cost sensitive learning.  ... 
doi:10.26483/ijarcs.v9i1.5495 fatcat:ul5eted7qzdtldj4olzosokwpu

Imbalance Robust Softmax for Deep Embeeding Learning [article]

Hao Zhu, Yang Yuan, Guosheng Hu, Xiang Wu, Neil Robertson
2020 arXiv   pre-print
In recent years, one research focus is to solve the open-set problem by discriminative deep embedding learning in the field of face recognition (FR) and person re-identification (re-ID).  ...  IR-Softmax can generalise to any softmax and its variants (which are discriminative for open-set problem) by directly setting the weights as their class centers, naturally solving the data imbalance problem  ...  Surprisingly, very little research explores the problem of data imbalance in FR and re-ID.  ... 
arXiv:2011.11155v1 fatcat:kbargigusva6zjnkehcuawsbrq

New Oversampling Approaches Based on Polynomial Fitting for Imbalanced Data Sets

Sami Gazzah, Najoua Essoukri Ben Amara
2008 2008 The Eighth IAPR International Workshop on Document Analysis Systems  
In this challenging situation, the trained classifier will accurately classify the majority class; nevertheless, it marginalizes the minority class.  ...  However, in some modular architecture, such as one against all in support vector machines classifier, the training dataset for one class risks to heavily outnumber the other classes.  ...  Since the non-author class is composed of 59 subclasses, we have opted to oversampling the minority class rather than undersampling the majority one to avoid within-class imbalance problem.  ... 
doi:10.1109/das.2008.74 dblp:conf/das/GazzahA08 fatcat:rkogfrydingllliujt7ki2nc5y

A survey on generative adversarial networks for imbalance problems in computer vision tasks

Vignesh Sampath, Iñaki Maurtua, Juan José Aguilar Martín, Aitor Gutierrez
2021 Journal of Big Data  
We elaborate the imbalance problems of each group, and provide GANs based solutions in each group.  ...  In this paper, we examine the most recent developments of GANs based techniques for addressing imbalance problems in image data.  ...  Acknowledgements The authors would like to thank the anonymous reviewers for their valuable comments and suggestions on the paper.  ... 
doi:10.1186/s40537-021-00414-0 pmid:33552840 pmcid:PMC7845583 fatcat:g3p6hbjuj5c5vbe23ms4g6ed6q

A VPN-Encrypted Traffic Identification Method Based on Ensemble Learning

Jie Cao, Xing-Liang Yuan, Ying Cui, Jia-Cheng Fan, Chin-Ling Chen
2022 Applied Sciences  
class imbalance, improving the Xgboost identification model by using the focal loss function for the data class imbalance problem; Finally, in order to improve the identification rate of VPN-encrypted  ...  Previous encrypted traffic identification methods suffer from feature redundancy, data class imbalance, and low identification rate.  ...  Thus, the amount of data between different classes is quite different, and there is a problem of imbalance in the amount of data between data classes.  ... 
doi:10.3390/app12136434 fatcat:tbgrbg5cijatldrfuzma3ga2z4
« Previous Showing results 1 — 15 out of 66,267 results