19,594 Hits in 6.0 sec

Cluster-based under-sampling approaches for imbalanced data distributions

Show-Jane Yen, Yue-Shi Lee
2009 Expert systems with applications  
In this paper, we propose cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy for minority class and investigate the effect  ...  The experimental results show that our cluster-based under-sampling approaches outperform the other under-sampling techniques in the previous studies.  ...  Table 1 shows the steps for our cluster-based under-sampling method SBC. For example, assume that an imbalanced class distribution dataset has totally 1100 samples.  ... 
doi:10.1016/j.eswa.2008.06.108 fatcat:btwnfjjx35gptcavitzdc26ngy

Imbalanced sentiment classification

Shoushan Li, Guodong Zhou, Zhongqing Wang, Sophia Yat Mei Lee, Rangyang Wang
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
In particular, a novel clusteringbased stratified under-sampling framework and a centroiddirected smoothing strategy are proposed to address the imbalanced class and feature distribution problems respectively  ...  Evaluation across different datasets shows the effectiveness of both the under-sampling framework and the smoothing strategy in handling the imbalanced problems in real sentiment classification applications  ...  1. 6) Clustering-based under-sampling (ClusterU): performing clustering-based stratified under-sampling. 7) Clustering-based under-sampling plus centroid-directed smoothing (ClusterUC): performing  ... 
doi:10.1145/2063576.2063994 dblp:conf/cikm/LiZWLW11 fatcat:4csm3ir3zzddtc6bofiyvkng2y

Investigating the Effect of Sampling Methods for Imbalanced Data Distributions

Show-Jane Yen, Yue-Shi Lee, Cheng-Han Lin, Jia-Ching Ying
2006 2006 IEEE International Conference on Systems, Man and Cybernetics  
In this paper, we propose a cluster-based sampling approach for selecting the representative data as training data to improve the classification accuracy and investigate the effect of under-sampling methods  ...  In the experiments, we evaluate the performances for our cluster-based sampling approach and the other sampling methods in the previous studies. I.  ...  Table 1 shows the steps for our cluster-based under-sampling method SBC. For example, assume that an imbalanced class distribution dataset has totally 1100 samples.  ... 
doi:10.1109/icsmc.2006.384787 dblp:conf/smc/YenLLY06 fatcat:hsoh5y2fqndbbeb75ogkfga63m

A novel imbalanced data classification approach using both under and over sampling

Seyyed Mohammad Javadi Moghaddam, Asadollah Noroozi
2021 Bulletin of Electrical Engineering and Informatics  
This paper presents a novel pre-processing technique that performs both over and under sampling algorithms for an imbalanced dataset.  ...  The performance of the data classification has encountered a problem when the data distribution is imbalanced.  ...  ACKNOWLEDGEMENTS The author would like to acknowledge the financial support of the Bozorgmehr University of Qaenat for this research under contract number 39141.  ... 
doi:10.11591/eei.v10i5.2785 fatcat:xafchkmabjbptec257qqglmvfa

PBCCUT- Priority based Class Clustered under Sampling Technique Approaches for Imbalanced Data Classification

Nagasuri Anuradha, G. Partha Saradhi Varma
2017 Indian Journal of Science and Technology  
His research paper proposes the result of the accurateness of the result by using the Priority Based Class Clustered under sampling Technique approaches for imbalanced data classification.  ...  Improvements: The present paper proposes a cluster-based priority under-sampling approach to select the representative data as training data to get better categorization and correctness for minority class  ...  Results obtained by the present approach are shown in Figure 7 . The proposed PBCCUT-Priority Based Class Clustered under sampling Technique approaches for imbalanced data classification.  ... 
doi:10.17485/ijst/2017/v10i18/107590 fatcat:ddkvwykg75bwxlevjiqzhimykq

A Review on Imbalanced Data Handling Using Undersampling and Oversampling Technique

2017 International Journal of Recent Trends in Engineering and Research  
For extracting useful date from such large dataset different data mining or machine learning techniques are used.  ...  In today's era of internet the amount of data generation is growing on increasing. Some of the data related to medical, e-commerce, social networking, etc. are of great importance.  ...  In order to overcome this drawback of the under-sampling approach Yen and Lee (2009) proposed an unsupervised learning technique for supervised learning called cluster based under-sampling.  ... 
doi:10.23883/ijrter.2017.3168.0uwxm fatcat:2qaqw5zbsncuhkvkvelws7ettm

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification [article]

Farshid Rayhan, Sajid Ahmed, Asif Mahbub, Md. Rafsan Jani, Swakkhar Shatabda, Dewan Md. Farid
2017 arXiv   pre-print
In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification.  ...  The experimental results show that the CUSBoost is a promising and effective approach for dealing with highly imbalanced datasets.  ...  Fig. 1 : 1 Sampling with boosting for classifying imbalanced data. Fig. 2 : 2 Random under-sampling (RUS) approach.  ... 
arXiv:1712.04356v1 fatcat:55wzik33ingsrih4zy3zzfphgy

A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem

Duygu Sinanc Terzi, Seref Sagiroglu
2019 Applied Computer Systems  
To resolve this issue, the present study proposes a new cluster-based MapReduce design, entitled Distributed Cluster-based Resampling for Imbalanced Big Data (DIBID).  ...  According to the results, DIBID outperformed other imbalanced big data solutions in the literature and increased area under the curve values between 10 % and 24 % through the case study.  ...  ACKNOWLEDGMENT The authors wish to thank Gazi University Project Unit (GAZİ BAP) for its contribution to the Gazi University Big Data and Information Security Center (GAZİ BIDISEC) under grant 06/2015-  ... 
doi:10.2478/acss-2019-0013 fatcat:pfo4cgve5bee7cs3wecnoerf3e

K-Means Cluster Based Undersampling Ensemble for Imbalanced Data Classification

2020 International Journal of Engineering and Advanced Technology  
In this paper, K-Means cluster based undersampling ensemble algorithm is proposed to solve the imbalanced data classification problem.  ...  The proposed method combines K-Means cluster based undersampling and boosting method.  ...  Small sample size The sample size plays a crucial role in imbalanced data classification. If the sample size is limited, the imbalanced data classification performance deteriorates.  ... 
doi:10.35940/ijeat.c5188.029320 fatcat:hzoe3vtk7zdcjfxcuxcb3ok2fy

QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem

Alexey V. Zakharov, Megan L. Peach, Markus Sitzmann, Marc C. Nicklaus
2014 Journal of Chemical Information and Modeling  
We have used several such imbalanced PubChem HTS assays to test and develop strategies to efficiently build robust QSAR models from imbalanced data sets.  ...  Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large number of inactive compounds.  ...  Each Imbalanced Learning Approach multiple under- sampling threshold, ratio 1:3 under-sampling one-sided under- sampling similarity under- sampling cluster under- sampling diversity under-  ... 
doi:10.1021/ci400737s pmid:24524735 pmcid:PMC3985743 fatcat:l2ishyaknzdotg6utphdulpvbi

Imbalanced Classification Based on Active Learning SMOTE

Ying Mi
2013 Research Journal of Applied Sciences Engineering and Technology  
To solve this problem, this study introduces the classification performance of support vector machine and presents an approach based on active learning SMOTE to classify the imbalanced data.  ...  SMOTE is a typical over-sampling technique which can effectively balance the imbalanced data. However, it brings noise and other problems affecting the classification accuracy.  ...  The experimental results show that our cluster-based under-sampling approaches outperform the other under-sampling techniques in the previous studies.  ... 
doi:10.19026/rjaset.5.5044 fatcat:vsvhh2uhebf6zmf523wbmmrhny

A Survey of Predictive Modelling under Imbalanced Distributions [article]

Paula Branco and Luis Torgo and Rita Ribeiro
2015 arXiv   pre-print
Many real world data mining applications involve obtaining predictive models using data sets with strongly imbalanced distributions of the target variable.  ...  In this survey we discuss the main challenges raised by imbalanced distributions, describe the main approaches to these problems, propose a taxonomy of these methods and refer to some related problems  ...  The existing re-sampling strategies are based on a diverse set of techniques such as: random under/over-sampling, distance methods, data cleaning approaches, clustering algorithms, synthesising new data  ... 
arXiv:1505.01658v2 fatcat:4335ofkuhjdpddjfs56daw4rju

GCN-Based Linkage Prediction for Face Clustering on Imbalanced Datasets: An Empirical Study [article]

Huafeng Yang, Xingjian Chen, Fangyi Zhang, Guangyue Hei, Yunjie Wang, Rong Du
2021 arXiv   pre-print
However, rare attention has been paid to GCN-based clustering on imbalanced data.  ...  Although imbalance problem has been extensively studied, the impact of imbalanced data on GCN-based linkage prediction task is quite different, which would cause problems in two aspects: imbalanced linkage  ...  Related Work GCN-based face clustering. Face clustering is essential for exploiting unlabeled face data, and has been widely used in many scenarios.  ... 
arXiv:2107.02477v2 fatcat:p6thznegwjfslitl5zvyx45cna

A novel approach for solving skewed classification problem using cluster based ensemble method

Gillala Rekha, ,Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, India – 522502, V Krishna Reddy, Amit Kumar Tyagi, ,Vellore Institute of Technology, Chennai Campus, Chennai, 600127, Tamilnadu, India
2019 Mathematical Foundations of Computing  
In this paper, we present a cluster-based oversampling with boosting algorithm (Cluster+Boost) for learning from imbalanced data.  ...  The experimental results are promising and provide an alternative approach for improving the performance of the classifier when learned on highly imbalanced data sets. 1. Introduction.  ...  [19] proposed clustering-based under-sampling with boosting called CUSBoost.  ... 
doi:10.3934/mfc.2020001 fatcat:jvgjznnv3jcu7kgfedytmlqhbq

Multi-Cluster Based Approach for skewed Data in Data Mining

Mr.Rushi Longadge
2013 IOSR Journal of Computer Engineering  
To solve this problem we propose multi cluster-based majority under-sampling and random minority oversampling approach.  ...  Compared to under-sampling, cluster-based random undersampling can effectively avoid the important information loss of majority class.  ...  For our multi-cluster-based majority under-sampling prediction algorithm, we first divide all the majority class samples in the data set into k clusters.  ... 
doi:10.9790/0661-1266673 fatcat:duvmrtdfw5edhe6p4ps775kg7i
« Previous Showing results 1 — 15 out of 19,594 results