A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
When is resampling beneficial for feature selection with imbalanced wide data?
2021
Expert systems with applications
Additionally, specific results are also obtained depending on the classifier used, for example, for Gaussian SVM the best performance is obtained when the feature selection is done with SVM-RFE before ...
This paper studies the effects that combinations of balancing and feature selection techniques have on wide data (many more attributes than instances) when different classifiers are used. ...
This problem is even more relevant when dealing with wide data, where the number of features is extremely high. ...
doi:10.1016/j.eswa.2021.116015
fatcat:gfy7cwrpxnhc7crkxo3bid7mze
A Study of Data Pre-processing Techniques for Imbalanced Biomedical Data Classification
[article]
2019
arXiv
pre-print
methods in most cases, thus, Feature Selection with SVM classifier is the best choice for imbalanced biomedical data learning. ...
However, resampling and Feature Selection techniques perform poorly when using C4.5 decision tree and Linear discriminant analysis classifiers; (2) for datasets with different distributions, techniques ...
In the meantime, considering that feature selection (FS) is also beneficial to imbalanced data learning, one of the recently developed FS approaches is also employed in this study (Yu et al. 2014) . ...
arXiv:1911.00996v1
fatcat:vrkuuh7ptbaa3p4kgd2eb47tbi
A Comparative Analysis of Data Resampling Methods on Imbalance Medical Data
2021
IEEE Access
Each categorical feature with n categories is converted to n binary (0-1) features [95, 96] .
D. ...
We imputed their median values for former smokers with missing entries for their age when they quit smoking. ...
doi:10.1109/access.2021.3102399
fatcat:4foj6xyyovanfhnr5z5fcvh5py
Malicious web domain identification using online credibility and performance data by considering the class imbalance issue
2019
Industrial management & data systems
Findings: By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain datasets with different ...
An integrated resampling approach is proposed to address the class im-balance issue. The performance of the proposed approach is confirmed based on real-world datasets with different imbalance ratios. ...
We would like to thank the handling editor and two anonymous reviewers for their valuable comments and suggestions on the previous version of this paper. ...
doi:10.1108/imds-02-2018-0072
fatcat:ugiip2ydtrdfrcrm6friudlkby
Partial Resampling of Imbalanced Data
[article]
2022
arXiv
pre-print
Imbalanced data is a frequently encountered problem in machine learning. ...
Despite a vast amount of literature on sampling techniques for imbalanced data, there is a limited number of studies that address the issue of the optimal sampling ratio. ...
It appears that the SVM classifier is better suited for imbalanced data when used in conjunction with data sampling. The details of the SVM-based experiments are supplied in Table 6 and Figure 3 . ...
arXiv:2207.04631v1
fatcat:shwanrkkrjhsncdg62s63kxvou
Online Defect Prediction for Imbalanced Data
2015
2015 IEEE/ACM 37th IEEE International Conference on Software Engineering
First, the data are imbalanced-there are much fewer buggy changes than clean changes. ...
Accepted for publication by IEEE. c 2015 IEEE. Personal use of this material is permitted. ...
We use four types of resampling techniques to predict for the imbalanced data: simple duplicate, SMOTE, spread subsample, and resampling with/without replacement [24] . ...
doi:10.1109/icse.2015.139
dblp:conf/icse/TanTDM15
fatcat:xav66z6k7vaw3bl3moofrncvf4
Experimental evaluation of ensemble classifiers for imbalance in Big Data
2021
Applied Soft Computing
In this paper, in-depth experimentation with ensemble classifiers is conducted in the context of imbalanced Big Data classification, using two popular ensemble families (Bagging and Boosting) and different ...
A common problem for classification, especially in Big Data, is that the numerous examples of the different classes might not be balanced. ...
This material is based upon work supported by Google Cloud, United States. ...
doi:10.1016/j.asoc.2021.107447
fatcat:4glhtjzn4vbbndrdm64hhwxtj4
A Universal Data Augmentation Approach for Fault Localization
2022
International Conference on Software Engineering
However, the input data is high-dimensional and extremely imbalanced since the real-world programs are large in size and the number of failing test cases is much less than that of passing test cases, which ...
Then, Aeneas handles the imbalanced data issue through generating synthesized failing test cases from the reduced feature space through conditional variational autoencoder (CVAE). ...
Aeneas is a novel approach to handle the problems of high-dimensional and extremely imbalanced data by feature selection and data synthesis, respectively. ...
doi:10.1145/3510003.3510136
dblp:conf/icse/XieLY00M22
fatcat:xrulttxmynckpdcqkqd76vrwpa
PSU: Particle Stacking Undersampling Method For Highly Imbalanced Big Data
2020
IEEE Access
Imbalanced classes are a common problem in machine learning, and the computational costs required for proper resampling increases with the data size. ...
INDEX TERMS Data mining, imbalanced data, undersampling, big data, support vector machines. ...
INTRODUCTION Dealing with imbalanced data is a crucial task in data mining studies. ...
doi:10.1109/access.2020.3009753
fatcat:viaqhideqra7jlz5ftm2td2epi
A Heterogeneous Ensemble Learning Model Based on Data Distribution for Credit Card Fraud Detection
2021
Wireless Communications and Mobile Computing
In this paper, we propose a heterogeneous ensemble learning model based on data distribution (HELMDD) to deal with imbalanced data in CCFD. ...
Credit card fraud detection (CCFD) is important for protecting the cardholder's property and the reputation of banks. ...
Resampling is a widely used method to address the problem of imbalanced classification data. ...
doi:10.1155/2021/2531210
fatcat:cjwjrdq43fhcbhnnxhf5zclngi
Big data preprocessing: methods and prospects
2016
Big Data Analytics
Addressing big data is a challenging and time-demanding task that requires a large computational infrastructure to ensure successful data processing and analysis. ...
The massive growth in the scale of data has been observed in recent years being a key factor of the Big Data scenario. ...
This method is also designed for matrices with a low number of features. ...
doi:10.1186/s41044-016-0014-0
fatcat:z3lqu2yi3vey3khbdal6mu34qa
Optimization of data resampling through GA for the classification of imbalanced datasets
2019
IJAIN (International Journal of Advances in Intelligent Informatics)
This paper overview a novel family of methods for the resampling of an imbalanced dataset in order to maximize the performance of arbitrary data-driven classifiers. ...
Classification of imbalanced datasets is a critical problem in numerous contexts. ...
These classifiers, in facts, aim at maximizing the overall performance that is achieved when coping with balanced datasets but it is not when the training datasets is imbalanced: in this latter case the ...
doi:10.26555/ijain.v5i3.409
fatcat:bmdt43ln4jdyrg32ksgk6dnwqu
CCR: A combined cleaning and resampling algorithm for imbalanced data classification
2017
International Journal of Applied Mathematics and Computer Science
Imbalanced data classification is one of the most widespread challenges in contemporary pattern recognition. ...
In this paper we describe a novel resampling technique focused on proper detection of minority examples in a two-class imbalanced data task. ...
One of the most important questions we have to ask when dealing with imbalanced data is what performance measure should we optimize for. ...
doi:10.1515/amcs-2017-0050
fatcat:me52726ub5folfedmkcp5f7b5i
Self-paced Ensemble for Highly Imbalanced Massive Data Classification
[article]
2019
arXiv
pre-print
Many real-world applications reveal difficulties in learning classifiers from imbalanced data. ...
The rising big data era has been witnessing more classification tasks with large-scale but extremely imbalance and low-quality datasets. ...
Notice that we update hardness value in each iteration (line 4-5) in order to select data samples that were most beneficial for the current ensemble. ...
arXiv:1909.03500v2
fatcat:l3uitgbvl5cjpj7f3eskvlstmi
A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework
[article]
2022
arXiv
pre-print
Additionally, we formulate open challenges and future directions for this domain. Our experimental testbed is fully reproducible and easy to extend with new methods. ...
We discuss the advantages and disadvantages of state-of-the-art classifiers in each of these scenarios and we provide general recommendations to end-users for selecting the best algorithms for imbalanced ...
Acknowledgements High Performance Computing resources provided by the High Performance Research Computing (HPRC) Core Facility at Virginia Commonwealth University (https://hprc.vcu.edu) were used for conducting ...
arXiv:2204.03719v1
fatcat:dulhr3cedrh6vd6m5m4qovffri
« Previous
Showing results 1 — 15 out of 938 results