22,466 Hits in 7.7 sec

Imbalanced sentiment classification

Shoushan Li, Guodong Zhou, Zhongqing Wang, Sophia Yat Mei Lee, Rangyang Wang
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
In particular, a novel clusteringbased stratified under-sampling framework and a centroiddirected smoothing strategy are proposed to address the imbalanced class and feature distribution problems respectively  ...  However, most existing studies assume the balance between negative and positive samples, which may not be true in reality. In this paper, we investigate imbalanced sentiment classification instead.  ...  In this paper, we propose a clustering-based stratified undersampling framework to overcome the imbalanced class distribution problem in imbalanced sentiment classification.  ... 
doi:10.1145/2063576.2063994 dblp:conf/cikm/LiZWLW11 fatcat:4csm3ir3zzddtc6bofiyvkng2y

Investigating the Effect of Sampling Methods for Imbalanced Data Distributions

Show-Jane Yen, Yue-Shi Lee, Cheng-Han Lin, Jia-Ching Ying
2006 2006 IEEE International Conference on Systems, Man and Cybernetics  
In this paper, we propose a cluster-based sampling approach for selecting the representative data as training data to improve the classification accuracy and investigate the effect of under-sampling methods  ...  In the experiments, we evaluate the performances for our cluster-based sampling approach and the other sampling methods in the previous studies. I.  ...  In this study, we propose cluster-based under-sampling approach to solve the imbalanced class distribution problem by using backpropagation neural network.  ... 
doi:10.1109/icsmc.2006.384787 dblp:conf/smc/YenLLY06 fatcat:hsoh5y2fqndbbeb75ogkfga63m

K-Means Cluster Based Undersampling Ensemble for Imbalanced Data Classification

2020 International Journal of Engineering and Advanced Technology  
In this paper, K-Means cluster based undersampling ensemble algorithm is proposed to solve the imbalanced data classification problem.  ...  The proposed method combines K-Means cluster based undersampling and boosting method.  ...  They are clustering with NearMisss-1, NearMisss-2, NearMisss-3, sampling based on clustering with Most Distance, sampling based on clustering with most far to choose the majority instances.  ... 
doi:10.35940/ijeat.c5188.029320 fatcat:hzoe3vtk7zdcjfxcuxcb3ok2fy

A novel approach for solving skewed classification problem using cluster based ensemble method

Gillala Rekha, ,Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, India – 522502, V Krishna Reddy, Amit Kumar Tyagi, ,Vellore Institute of Technology, Chennai Campus, Chennai, 600127, Tamilnadu, India
2019 Mathematical Foundations of Computing  
In this paper, we present a cluster-based oversampling with boosting algorithm (Cluster+Boost) for learning from imbalanced data.  ...  Several techniques have been proposed to handle the problem of class imbalance, including data sampling and boosting.  ...  To overcome such issues, newer data generation methods based on clustering approach have been proposed.  ... 
doi:10.3934/mfc.2020001 fatcat:jvgjznnv3jcu7kgfedytmlqhbq

A Review on Imbalanced Data Handling Using Undersampling and Oversampling Technique

2017 International Journal of Recent Trends in Engineering and Research  
But these imbalanced nature of the datasets affects the performance of a classifier very greatly. To deal with this it is necessary to understand the problem of imbalanced learning.  ...  In today's era of internet the amount of data generation is growing on increasing. Some of the data related to medical, e-commerce, social networking, etc. are of great importance.  ...  Leading to the problem of imbalanced data. The data which has an unequal distribution of samples among classes is known as imbalanced data.  ... 
doi:10.23883/ijrter.2017.3168.0uwxm fatcat:2qaqw5zbsncuhkvkvelws7ettm

A novel imbalanced data classification approach using both under and over sampling

Seyyed Mohammad Javadi Moghaddam, Asadollah Noroozi
2021 Bulletin of Electrical Engineering and Informatics  
The performance of the data classification has encountered a problem when the data distribution is imbalanced.  ...  Moreover, a cluster-based approach is performed to decrease the majority class which takes into consideration the new size of the minority class.  ...  ACKNOWLEDGEMENTS The author would like to acknowledge the financial support of the Bozorgmehr University of Qaenat for this research under contract number 39141.  ... 
doi:10.11591/eei.v10i5.2785 fatcat:xafchkmabjbptec257qqglmvfa

Cluster-based under-sampling approaches for imbalanced data distributions

Show-Jane Yen, Yue-Shi Lee
2009 Expert systems with applications  
In this paper, we propose cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy for minority class and investigate the effect  ...  The experimental results show that our cluster-based under-sampling approaches outperform the other under-sampling techniques in the previous studies.  ...  In this study, we propose cluster-based under-sampling approaches to solve the imbalanced class distribution problem by using backpropagation neural network.  ... 
doi:10.1016/j.eswa.2008.06.108 fatcat:btwnfjjx35gptcavitzdc26ngy

PBCCUT- Priority based Class Clustered under Sampling Technique Approaches for Imbalanced Data Classification

Nagasuri Anuradha, G. Partha Saradhi Varma
2017 Indian Journal of Science and Technology  
Improvements: The present paper proposes a cluster-based priority under-sampling approach to select the representative data as training data to get better categorization and correctness for minority class  ...  His research paper proposes the result of the accurateness of the result by using the Priority Based Class Clustered under sampling Technique approaches for imbalanced data classification.  ...  PBCCUT-Priority based Class Clustered under Sampling Technique Approaches for Imbalanced Data Classification Data mining techniques 3 , which center on study effectual and efficient algorithms to change  ... 
doi:10.17485/ijst/2017/v10i18/107590 fatcat:ddkvwykg75bwxlevjiqzhimykq

Learning Statistical Representation with Joint Deep Embedded Clustering [article]

Mina Rezaei, Emilio Dorigatti, David Ruegamer, Bernd Bischl
2021 arXiv   pre-print
However, these approaches are sensitive to imbalanced data and out-of-distribution samples. Hence, these methods optimize clustering by pushing data close to randomly initialized cluster centers.  ...  StatDEC simultaneously trains two deep learning models, a deep statistics network that captures the data distribution, and a deep clustering network that learns embedded features and performs clustering  ...  Second, we study the impact of our proposed method for learning out-of-distribution samples and the ability to handle the imbalanced data situation. A.  ... 
arXiv:2109.05232v1 fatcat:m2cla2r6tfe5dg2bq65zvlvbhm

A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem

Duygu Sinanc Terzi, Seref Sagiroglu
2019 Applied Computer Systems  
To resolve this issue, the present study proposes a new cluster-based MapReduce design, entitled Distributed Cluster-based Resampling for Imbalanced Big Data (DIBID).  ...  The first strategy has been designed to present the success of the model on data sets with different imbalanced ratios.  ...  ACKNOWLEDGMENT The authors wish to thank Gazi University Project Unit (GAZİ BAP) for its contribution to the Gazi University Big Data and Information Security Center (GAZİ BIDISEC) under grant 06/2015-  ... 
doi:10.2478/acss-2019-0013 fatcat:pfo4cgve5bee7cs3wecnoerf3e

GCN-Based Linkage Prediction for Face Clustering on Imbalanced Datasets: An Empirical Study [article]

Huafeng Yang, Xingjian Chen, Fangyi Zhang, Guangyue Hei, Yunjie Wang, Rong Du
2021 arXiv   pre-print
However, rare attention has been paid to GCN-based clustering on imbalanced data.  ...  The problem of imbalanced linkage labels is similar to that in image classification task, but the latter is a particular problem in GCN-based clustering via linkage prediction.  ...  However, in the real world, the data are more likely to be imbalanced distribution, which leads to great challenge.  ... 
arXiv:2107.02477v2 fatcat:p6thznegwjfslitl5zvyx45cna

Adaptive Semi-Unsupervised Weighted Oversampling with Sparsity Factor for Imbalanced Biomedical Data

Haseeb Ali, Faculty of Computer Science and Information Technology,Universiti Tun Hussein Onn Malaysia, Batu Pahat, 86400 Johor, MALAYSIA, Nurul Ashikin Samat, Hafiz Maaz Ashgher, Faculty of Computer Science and Information Technology,Universiti Tun Hussein Onn Malaysia, Batu Pahat, 86400 Johor, MALAYSIA, Faculty of Computer Science and Information Technology,Universiti Tun Hussein Onn Malaysia, Batu Pahat, 86400 Johor, MALAYSIA
2020 Journal of Soft Computing and Data Mining  
Acknowledgement The authors would like to thank Universiti Tun Hussein Onn Malaysia (UTHM) for supporting this research under Postgraduate Incentive Research Grant, Vote No. H334.  ...  sub-cluster Probability distribution of the sample for the data generation is calculated on behalf of weights assigned to the minority samples.  ...  These approaches can be divided into three categories [17] : (a) data level approach which is a preprocessing stage for rebalancing the distribution of samples in the classes [18] , (b) algorithmic approach  ... 
doi:10.30880/jscdm.2020.01.01.003 fatcat:bhied2gkxjhpzpnhin2p6pkjqq

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification [article]

Farshid Rayhan, Sajid Ahmed, Asif Mahbub, Md. Rafsan Jani, Swakkhar Shatabda, Dewan Md. Farid
2017 arXiv   pre-print
In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification.  ...  The proposed algorithm provides an alternative to RUSBoost (random under-sampling with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost) algorithms.  ...  Fig. 1 : 1 Sampling with boosting for classifying imbalanced data. Fig. 2 : 2 Random under-sampling (RUS) approach.  ... 
arXiv:1712.04356v1 fatcat:55wzik33ingsrih4zy3zzfphgy

Affinity Propagation SMOTE approach for Imbalanced dataset used in Predicting Student at Risk of Low Performance

B. Laureano Lanie
2020 International Journal of Advanced Trends in Computer Science and Engineering  
This approach uses affinity propagation to automatically produce clusters and cluster exemplars used to select the clusters to be oversampled.  ...  SMOTE and AP SMOTE are applied to the imbalanced dataset.  ...  ACKNOWLEDGEMENT The authors would like to thank the proponents of a DOSCST Institutional Study for the dataset provided for this paper.  ... 
doi:10.30534/ijatcse/2020/127942020 fatcat:cyosphylfzgo3ieom63wqb7rgq

Rotating Machinery Fault Diagnosis for Imbalanced Data Based on Fast Clustering Algorithm and Support Vector Machine

Xiaochen Zhang, Dongxiang Jiang, Te Han, Nanfei Wang, Wenguang Yang, Yizhou Yang
2017 Journal of Sensors  
To diagnose rotating machinery fault for imbalanced data, a method based on fast clustering algorithm (FCA) and support vector machine (SVM) was proposed.  ...  Next, a fast clustering algorithm was adopted to reduce the number of the majority data from the imbalanced fault sample set.  ...  Therefore, we proposed an approach based on a fast clustering algorithm to reduce the number of the majority data from the imbalanced data.  ... 
doi:10.1155/2017/8092691 fatcat:65bkedifdzactpwjhhhkdn3xqi
« Previous Showing results 1 — 15 out of 22,466 results