A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2010; you can also visit the original URL.
The file type is application/pdf
.
Filters
Investigating the Effect of Sampling Methods for Imbalanced Data Distributions
2006
2006 IEEE International Conference on Systems, Man and Cybernetics
In this paper, we propose a cluster-based sampling approach for selecting the representative data as training data to improve the classification accuracy and investigate the effect of under-sampling methods ...
It is important to select the suitable training data for classification in the imbalanced class distribution problem. ...
CONCLUSIONS In a classification task, the effect of imbalanced class distribution problem is often ignored. ...
doi:10.1109/icsmc.2006.384787
dblp:conf/smc/YenLLY06
fatcat:hsoh5y2fqndbbeb75ogkfga63m
An Effective Approach for Imbalanced Classification: Unevenly Balanced Bagging
2013
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Bagging is one of the most popular and effective ensemble learning methods for improving the performance of prediction models; however, there is a major drawback on extremely imbalanced data-sets. ...
Much research has addressed the problem of imbalanced data by using sampling methods to generate an equally balanced training set to improve the performance of the prediction models, but it is unclear ...
Sampling techniques are considered to be an effective way to tackle the imbalanced class distribution problem. ...
doi:10.1609/aaai.v27i1.8536
fatcat:m6nk26ld25fs5dzll5ysirsx74
An Investigation of Sensitivity on Bagging Predictors: An Empirical Approach
2021
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
methods to address the important issue of understanding the effect of vary- ing levels of class distribution on bagging predictors. ...
This study empirically investigates the sensitivity of bagging predictors with respect to 12 algorithms and 9 levels of class distribution on 14 imbalanced data-sets by using statistical and graphical ...
Designed Framework Figure 1 represents designed framework to investigate the sensitivity of bagging predictors as follows: (1) a random under-sampling (RUS) method is used to change original data-set ...
doi:10.1609/aaai.v26i1.8415
fatcat:3yopbrgxmje73gfhwrscqd2k34
A Heterogeneous Ensemble Learning Model Based on Data Distribution for Credit Card Fraud Detection
2021
Wireless Communications and Mobile Computing
performance for the majority class samples (legal transactions), which greatly increases the investigation cost for banks. ...
In this paper, we propose a heterogeneous ensemble learning model based on data distribution (HELMDD) to deal with imbalanced data in CCFD. ...
Acknowledgments This work was partially supported by the National Natural Science Foundation of China (Nos. 61732022, 61732004, 61672020, and 62072131) and the National Key R&D Program of China (Nos. 2017YFB0802204 ...
doi:10.1155/2021/2531210
fatcat:cjwjrdq43fhcbhnnxhf5zclngi
ADASYN: Adaptive synthetic sampling approach for imbalanced learning
2008
2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)
This paper presents a novel adaptive synthetic (ADASYN) sampling approach for learning from imbalanced data sets. ...
Simulation analyses on several machine learning data sets show the effectiveness of this method across five evaluation metrics. ...
Based on the original data distribution, ADASYN can adaptively generate synthetic data samples for the minority class to reduce the bias introduced by the imbalanced data distribution. ...
doi:10.1109/ijcnn.2008.4633969
dblp:conf/ijcnn/HeBGL08
fatcat:gvlngnj6wnhyjoexorpycvb7fa
Imbalanced sentiment classification
2011
Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11
Evaluation across different datasets shows the effectiveness of both the under-sampling framework and the smoothing strategy in handling the imbalanced problems in real sentiment classification applications ...
However, most existing studies assume the balance between negative and positive samples, which may not be true in reality. In this paper, we investigate imbalanced sentiment classification instead. ...
Acknowledgments The research work described in this paper has been partially supported by three NSFC grants, No. 61003155, No. 60873150 and No. 90920004 and Open Projects Program of National Laboratory ...
doi:10.1145/2063576.2063994
dblp:conf/cikm/LiZWLW11
fatcat:4csm3ir3zzddtc6bofiyvkng2y
Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list
2009
Journal of Intelligent Information Systems
Unfortunately, along with the development of the malware writing techniques, the number of file samples in the gray list that need to be analyzed by virus analysts on a daily basis is constantly increasing ...
The gray list is not only large in size, but also has an imbalanced class distribution where malware is the minority class. In this paper, we describe our research effort on ...
However, the choices of the class distribution and the size of the training data in the sampling strategy for malware detection are not trivial and need careful investigation. ...
doi:10.1007/s10844-009-0086-7
fatcat:vpmheahxwnclzinmm37olk2aha
Flimma: a federated and privacy-preserving tool for differential gene expression analysis
[article]
2020
arXiv
pre-print
Flimma (https://exbio.wzw.tum.de/flimma/) addresses this issue by implementing the state-of-the-art workflow limma voom in a privacy-preserving manner, i.e. patient data never leaves its source site. ...
Aggregating transcriptomics data across hospitals can increase sensitivity and robustness of differential expression analyses, yielding deeper clinical insights. ...
This publication reflects only the authors' view and the European Commission is not responsible for any use that may be made of the information it contains. ...
arXiv:2010.16403v3
fatcat:deza7falyrfbxggj7zz546du7i
A novel ensemble method for classifying imbalanced data
2015
Pattern Recognition
alter the original data distribution. ...
Finally, the classification results of these classifiers for new data are combined by a specific ensemble rule. ...
Acknowledgment This work is supported by the National Natural Science Foundation of China under Grants 61373046 and 61210004. ...
doi:10.1016/j.patcog.2014.11.014
fatcat:s5an7f747jhpnnrfux35grxylu
A New Synthetic Oversampling Method Using Ontology and Feature Selection in Order to Improve Imbalanced Textual Data Classification in Persian Texts
2016
Bulletin de la Société royale des sciences de Liège
In the new method the number of minor class samples is increased using ontology and then random oversampling is performed for minor class. ...
Ever-growing extension of textual data has increased the necessity of processing textual data. Data imbalance in classification of textual data is one of the cases that decrease efficiency. ...
In this research, sampling methods like sub-sampling and pre-sampling were investigated and the results were compared by the other method of collision with imbalanced data like the method of using cost ...
doi:10.25518/0037-9565.5414
fatcat:hcnvus36jnbefdx4v4oammojqu
A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction
2018
IEEE Transactions on Software Engineering
clearly harm the performance of traditional learning for SDP; (c) imbalanced learning is more effective on the data sets with moderate or higher imbalance, however negative results are always possible ...
Second, the appropriate combination of imbalanced method and classifier needs to be carefully chosen to ameliorate the imbalanced learning problem for SDP. ...
ACKNOWLEDGMENT We thank the reviewers and associate editor for their many constructive suggestions. ...
doi:10.1109/tse.2018.2836442
fatcat:osf7ndo2lventgx6q4swq4snmu
Imbalanced Data Classification Based on AdaBoost-SVM
2014
International Journal of Database Theory and Application
In the experiments with four typical forms of imbalanced data sets in UCI were validated the effectiveness of this strategy. 93 classifier and focusing on the fallible samples by layered combination and ...
The classification of imbalanced data is one of the most challenging problems in data mining and machine learning research. ...
National Natural Science Foundation of China (61103149), Technological Innovation Foundation for Youth Scholars of Harbin (2012RFQXG093) and Province Natural Science Foundation of Heilongjiang (QC2013C060 ...
doi:10.14257/ijdta.2014.7.5.06
fatcat:lqnfipsl7rhrthdgje66grv2wq
Learning from Imbalanced Data
2009
IEEE Transactions on Knowledge and Data Engineering
The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews. ...
In this paper, we provide a comprehensive review of the development of research in learning from imbalanced data. ...
Sampling Methods for Imbalanced Learning Typically, the use of sampling methods in imbalanced learning applications consists of the modification of an imbalanced data set by some mechanisms in order to ...
doi:10.1109/tkde.2008.239
fatcat:bztocrruyfgrfcbifxjwimatim
Prediction of hematocrit through imbalanced dataset of blood spectra
2021
Healthcare technology letters
One of the big issues for a machine learning algorithm is related to imbalanced dataset. An imbalanced dataset occurs when the distribution of data is not uniform. ...
The aim of this work is to show the effects of two balancing techniques (SMOTE and SMOTE+ENN) on the imbalanced dataset of blood spectra. ...
There are several methods for balancing the distribution of samples in datasets. ...
doi:10.1049/htl2.12006
pmid:33850628
pmcid:PMC8024026
fatcat:37osbwlyqnbkxgyruhcywd76d4
Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning
[article]
2021
arXiv
pre-print
Based on these findings, we suggest to re-think the current paradigm of having a single data re-sampling strategy and develop a simple yet highly effective Bi-Sampling (BiS) strategy for SSL on class-imbalanced ...
In particular, we decouple the training of the representation and the classifier, and systematically investigate the effects of different data re-sampling techniques when training the whole network including ...
In this paper, we investigate SSL in the context of long-tailed data distribution where both labeled and unlabeled data have the same imbalanced class ratio (Figure 1 Right). ...
arXiv:2106.00209v2
fatcat:gibpzulpbveebdvnbmrp6ivpza
« Previous
Showing results 1 — 15 out of 39,187 results