39,187 Hits in 6.5 sec

Investigating the Effect of Sampling Methods for Imbalanced Data Distributions

Show-Jane Yen, Yue-Shi Lee, Cheng-Han Lin, Jia-Ching Ying
2006 2006 IEEE International Conference on Systems, Man and Cybernetics  
In this paper, we propose a cluster-based sampling approach for selecting the representative data as training data to improve the classification accuracy and investigate the effect of under-sampling methods  ...  It is important to select the suitable training data for classification in the imbalanced class distribution problem.  ...  CONCLUSIONS In a classification task, the effect of imbalanced class distribution problem is often ignored.  ... 
doi:10.1109/icsmc.2006.384787 dblp:conf/smc/YenLLY06 fatcat:hsoh5y2fqndbbeb75ogkfga63m

An Effective Approach for Imbalanced Classification: Unevenly Balanced Bagging

Guohua Liang, Anthony Cohn
Bagging is one of the most popular and effective ensemble learning methods for improving the performance of prediction models; however, there is a major drawback on extremely imbalanced data-sets.  ...  Much research has addressed the problem of imbalanced data by using sampling methods to generate an equally balanced training set to improve the performance of the prediction models, but it is unclear  ...  Sampling techniques are considered to be an effective way to tackle the imbalanced class distribution problem.  ... 
doi:10.1609/aaai.v27i1.8536 fatcat:m6nk26ld25fs5dzll5ysirsx74

An Investigation of Sensitivity on Bagging Predictors: An Empirical Approach

Guohua Liang
methods to address the important issue of understanding the effect of vary- ing levels of class distribution on bagging predictors.  ...  This study empirically investigates the sensitivity of bagging predictors with respect to 12 algorithms and 9 levels of class distribution on 14 imbalanced data-sets by using statistical and graphical  ...  Designed Framework Figure 1 represents designed framework to investigate the sensitivity of bagging predictors as follows: (1) a random under-sampling (RUS) method is used to change original data-set  ... 
doi:10.1609/aaai.v26i1.8415 fatcat:3yopbrgxmje73gfhwrscqd2k34

A Heterogeneous Ensemble Learning Model Based on Data Distribution for Credit Card Fraud Detection

Yalong Xie, Aiping Li, Liqun Gao, Ziniu Liu, Shan Zhong
2021 Wireless Communications and Mobile Computing  
performance for the majority class samples (legal transactions), which greatly increases the investigation cost for banks.  ...  In this paper, we propose a heterogeneous ensemble learning model based on data distribution (HELMDD) to deal with imbalanced data in CCFD.  ...  Acknowledgments This work was partially supported by the National Natural Science Foundation of China (Nos. 61732022, 61732004, 61672020, and 62072131) and the National Key R&D Program of China (Nos. 2017YFB0802204  ... 
doi:10.1155/2021/2531210 fatcat:cjwjrdq43fhcbhnnxhf5zclngi

ADASYN: Adaptive synthetic sampling approach for imbalanced learning

Haibo He, Yang Bai, Edwardo A. Garcia, Shutao Li
2008 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)  
This paper presents a novel adaptive synthetic (ADASYN) sampling approach for learning from imbalanced data sets.  ...  Simulation analyses on several machine learning data sets show the effectiveness of this method across five evaluation metrics.  ...  Based on the original data distribution, ADASYN can adaptively generate synthetic data samples for the minority class to reduce the bias introduced by the imbalanced data distribution.  ... 
doi:10.1109/ijcnn.2008.4633969 dblp:conf/ijcnn/HeBGL08 fatcat:gvlngnj6wnhyjoexorpycvb7fa

Imbalanced sentiment classification

Shoushan Li, Guodong Zhou, Zhongqing Wang, Sophia Yat Mei Lee, Rangyang Wang
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
Evaluation across different datasets shows the effectiveness of both the under-sampling framework and the smoothing strategy in handling the imbalanced problems in real sentiment classification applications  ...  However, most existing studies assume the balance between negative and positive samples, which may not be true in reality. In this paper, we investigate imbalanced sentiment classification instead.  ...  Acknowledgments The research work described in this paper has been partially supported by three NSFC grants, No. 61003155, No. 60873150 and No. 90920004 and Open Projects Program of National Laboratory  ... 
doi:10.1145/2063576.2063994 dblp:conf/cikm/LiZWLW11 fatcat:4csm3ir3zzddtc6bofiyvkng2y

Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list

Yanfang Ye, Tao Li, Kai Huang, Qingshan Jiang, Yong Chen
2009 Journal of Intelligent Information Systems  
Unfortunately, along with the development of the malware writing techniques, the number of file samples in the gray list that need to be analyzed by virus analysts on a daily basis is constantly increasing  ...  The gray list is not only large in size, but also has an imbalanced class distribution where malware is the minority class. In this paper, we describe our research effort on  ...  However, the choices of the class distribution and the size of the training data in the sampling strategy for malware detection are not trivial and need careful investigation.  ... 
doi:10.1007/s10844-009-0086-7 fatcat:vpmheahxwnclzinmm37olk2aha

Flimma: a federated and privacy-preserving tool for differential gene expression analysis [article]

Olga Zolotareva
2020 arXiv   pre-print
Flimma ( addresses this issue by implementing the state-of-the-art workflow limma voom in a privacy-preserving manner, i.e. patient data never leaves its source site.  ...  Aggregating transcriptomics data across hospitals can increase sensitivity and robustness of differential expression analyses, yielding deeper clinical insights.  ...  This publication reflects only the authors' view and the European Commission is not responsible for any use that may be made of the information it contains.  ... 
arXiv:2010.16403v3 fatcat:deza7falyrfbxggj7zz546du7i

A novel ensemble method for classifying imbalanced data

Zhongbin Sun, Qinbao Song, Xiaoyan Zhu, Heli Sun, Baowen Xu, Yuming Zhou
2015 Pattern Recognition  
alter the original data distribution.  ...  Finally, the classification results of these classifiers for new data are combined by a specific ensemble rule.  ...  Acknowledgment This work is supported by the National Natural Science Foundation of China under Grants 61373046 and 61210004.  ... 
doi:10.1016/j.patcog.2014.11.014 fatcat:s5an7f747jhpnnrfux35grxylu

A New Synthetic Oversampling Method Using Ontology and Feature Selection in Order to Improve Imbalanced Textual Data Classification in Persian Texts

Jafar Pouramini, Behrouz Minaei-Bidgoli
2016 Bulletin de la Société royale des sciences de Liège  
In the new method the number of minor class samples is increased using ontology and then random oversampling is performed for minor class.  ...  Ever-growing extension of textual data has increased the necessity of processing textual data. Data imbalance in classification of textual data is one of the cases that decrease efficiency.  ...  In this research, sampling methods like sub-sampling and pre-sampling were investigated and the results were compared by the other method of collision with imbalanced data like the method of using cost  ... 
doi:10.25518/0037-9565.5414 fatcat:hcnvus36jnbefdx4v4oammojqu

A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction

Qinbao Song, Yuchen Guo, Martin Shepperd
2018 IEEE Transactions on Software Engineering  
clearly harm the performance of traditional learning for SDP; (c) imbalanced learning is more effective on the data sets with moderate or higher imbalance, however negative results are always possible  ...  Second, the appropriate combination of imbalanced method and classifier needs to be carefully chosen to ameliorate the imbalanced learning problem for SDP.  ...  ACKNOWLEDGMENT We thank the reviewers and associate editor for their many constructive suggestions.  ... 
doi:10.1109/tse.2018.2836442 fatcat:osf7ndo2lventgx6q4swq4snmu

Imbalanced Data Classification Based on AdaBoost-SVM

Peng Li, Ting-ting Bi, Xiao-yang Yu, Si-ben Li
2014 International Journal of Database Theory and Application  
In the experiments with four typical forms of imbalanced data sets in UCI were validated the effectiveness of this strategy. 93 classifier and focusing on the fallible samples by layered combination and  ...  The classification of imbalanced data is one of the most challenging problems in data mining and machine learning research.  ...  National Natural Science Foundation of China (61103149), Technological Innovation Foundation for Youth Scholars of Harbin (2012RFQXG093) and Province Natural Science Foundation of Heilongjiang (QC2013C060  ... 
doi:10.14257/ijdta.2014.7.5.06 fatcat:lqnfipsl7rhrthdgje66grv2wq

Learning from Imbalanced Data

Haibo He, E.A. Garcia
2009 IEEE Transactions on Knowledge and Data Engineering  
The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews.  ...  In this paper, we provide a comprehensive review of the development of research in learning from imbalanced data.  ...  Sampling Methods for Imbalanced Learning Typically, the use of sampling methods in imbalanced learning applications consists of the modification of an imbalanced data set by some mechanisms in order to  ... 
doi:10.1109/tkde.2008.239 fatcat:bztocrruyfgrfcbifxjwimatim

Prediction of hematocrit through imbalanced dataset of blood spectra

Cristoforo Decaro, Giovanni Battista Montanari, Marco Bianconi, Gaetano Bellanca
2021 Healthcare technology letters  
One of the big issues for a machine learning algorithm is related to imbalanced dataset. An imbalanced dataset occurs when the distribution of data is not uniform.  ...  The aim of this work is to show the effects of two balancing techniques (SMOTE and SMOTE+ENN) on the imbalanced dataset of blood spectra.  ...  There are several methods for balancing the distribution of samples in datasets.  ... 
doi:10.1049/htl2.12006 pmid:33850628 pmcid:PMC8024026 fatcat:37osbwlyqnbkxgyruhcywd76d4

Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning [article]

Ju He, Adam Kortylewski, Shaokang Yang, Shuai Liu, Cheng Yang, Changhu Wang, Alan Yuille
2021 arXiv   pre-print
Based on these findings, we suggest to re-think the current paradigm of having a single data re-sampling strategy and develop a simple yet highly effective Bi-Sampling (BiS) strategy for SSL on class-imbalanced  ...  In particular, we decouple the training of the representation and the classifier, and systematically investigate the effects of different data re-sampling techniques when training the whole network including  ...  In this paper, we investigate SSL in the context of long-tailed data distribution where both labeled and unlabeled data have the same imbalanced class ratio (Figure 1 Right).  ... 
arXiv:2106.00209v2 fatcat:gibpzulpbveebdvnbmrp6ivpza
« Previous Showing results 1 — 15 out of 39,187 results