Filters








96 Hits in 4.4 sec

A first attempt on global evolutionary undersampling for imbalanced big data

I. Triguero, M. Galar, H. Bustince, F. Herrera
2017 2017 IEEE Congress on Evolutionary Computation (CEC)  
Please see the repository url above for details on accessing the published version and note that access may require a subscription. For more information, please contact eprints@nottingham.ac.uk  ...  A note on versions: The version presented here may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher's version.  ...  A First Attempt on Global Evolutionary Undersampling for Imbalanced Big Data I. Triguero, M. Galar, H. Bustince, F.  ... 
doi:10.1109/cec.2017.7969553 dblp:conf/cec/TrigueroGBH17 fatcat:x3kavbn4indepf2o2ph3lkeege

An insight into imbalanced Big Data classification: outcomes and challenges

Alberto Fernández, Sara del Río, Nitesh V. Chawla, Francisco Herrera
2017 Complex & Intelligent Systems  
Being still a recent discipline, few research has been conducted on imbalanced classification for Big Data.  ...  First, to present the first outcomes for imbalanced classification in Big Data problems, introducing the current B Alberto Fernández alberto@decsai.ugr.es Sara del Río research state of this area.  ...  First, "Preliminaries" section presents an introduction on classification with imbalanced datasets, and a short description for Big Data and the MapReduce framework.  ... 
doi:10.1007/s40747-017-0037-9 fatcat:ylhtsi7jqbfr7au7livutzvqxu

Smart Data driven Decision Trees Ensemble Methodology for Imbalanced Big Data [article]

Diego García-Gil, Salvador García, Ning Xiong, Francisco Herrera
2021 arXiv   pre-print
In this paper, we propose a novel Smart Data driven Decision Trees Ensemble methodology for addressing the imbalanced classification problem in Big Data domains, namely SD_DeTE methodology.  ...  Big Data scenarios pose a new challenge to traditional imbalanced classification algorithms, since they are not prepared to work with such amount of data.  ...  An Smart Data driven Decision Trees Ensemble Methodology for Imbalanced Big Data In this section, we describe in detail the proposed ensemble methodology for imbalanced Big Data classification based on  ... 
arXiv:2001.05759v3 fatcat:foeawm37nbfaxgtw6d36pnz5u4

An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics

Victoria López, Alberto Fernández, Salvador García, Vasile Palade, Francisco Herrera
2013 Information Sciences  
A study on the Scalability of FRBCSs for Imbalanced Datasets in the Big Data Scenario.  ...  Herrera, Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling, Soft Computing 15 (10) (2011) 1909–1936. [90] R.  ... 
doi:10.1016/j.ins.2013.07.007 fatcat:dtuhxqu7hzfclktmqlci5ykiw4

Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets

Mikel Galar, Alberto Fernández, Edurne Barrenechea, Humberto Bustince, Francisco Herrera
2016 Information Sciences  
The goal of this work is to improve the capabilities of tree-based ensemble-based solutions that were specifically designed for imbalanced classification, focusing on the best behaving bagging-and boosting-based  ...  The scenario of classification with imbalanced datasets has gained a notorious significance in the last years.  ...  As in the standard case study, ensembles of classifiers are a very valuable tool for addressing the imbalanced classification problem in a Big Data scenario [16, 80] .  ... 
doi:10.1016/j.ins.2016.02.056 fatcat:3sryshnr65cu3hivmvxhzg2nbe

A Comparative Analysis of Machine Learning Models for Banking News Extraction by Multiclass Classification With Imbalanced Datasets of Financial News: Challenges and Solutions

Varun Dogra, Sahil Verma, Kavita Verma, Nz Jhanjhi, Uttam Ghosh, Dac-Nhuong Le
2022 International Journal of Interactive Multimedia and Artificial Intelligence  
The news articles divided into the mentioned classes were imbalanced. Imbalance data is a big difficulty with most classifier learning algorithms.  ...  A critical component of the aforementioned system should, therefore, include one module for extracting and storing news articles, and another module for classifying these text documents into a specific  ...  Evolutionary undersampling outperforms the non-evolutionary models by increasing the degree of imbalance [41] .  ... 
doi:10.9781/ijimai.2022.02.002 fatcat:rl7mwnqqwjf7bjfpfsxg2fsb2i

Evolutionary Machine Learning: A Survey

Akbar Telikani, Amirhessam Tahmassebi, Wolfgang Banzhaf, Amir H. Gandomi
2022 ACM Computing Surveys  
Evolutionary Computation (EC) approaches are inspired by nature and solve optimization problems in a stochastic manner.  ...  For each category, we discuss evolutionary machine learning in terms of three aspects: problem formulation, search mechanisms, and fitness value computation.  ...  EML on big data: Big data offers new opportunities for ML, but it also brings challenges such as computational costs, huge high-dimensional sample sizes, storage impasse, and error extent [161] .  ... 
doi:10.1145/3467477 fatcat:o6m3nekqfnaudjnxxoeferhine

Arabic Authorship Attribution Using Synthetic Minority Over-Sampling Technique and Principal Components Analysis for Imbalanced Documents

Hassina Hadjadj, Halim Sayoud
2021 International Journal of Cognitive Informatics and Natural Intelligence  
attribution on imbalanced data.  ...  Nowadays, dealing with imbalanced data represents a great challenge in data mining as well as in machine learning task.  ...  ACKNOwLeDGMeNT We would like to thank warmly the editor-in-chief and the reviewers for their valuable comments.  ... 
doi:10.4018/ijcini.20211001.oa33 fatcat:uxonzf2v2ndnpeuhi3p7lcetm4

A Cost-Sensitive Deep Belief Network for Imbalanced Classification [article]

Chong Zhang, Kay Chen Tan, Haizhou Li, Geok Soon Hong
2018 arXiv   pre-print
This paper proposes an evolutionary cost-sensitive deep belief network (ECS-DBN) for imbalanced classification.  ...  However, conventional DBN does not work well for imbalanced data classification because it assumes equal costs for each class.  ...  ACKNOWLEDGMENT Chong Zhang and Haizhou Li were supported by Neuromorphic Computing Program, RIE2020 AME Programmatic Grant, A*STAR, Singapore.  ... 
arXiv:1804.10801v2 fatcat:f44dbkugarc4dc3oklhtc6sn6e

Survey on deep learning with class imbalance

Justin M. Johnson, Taghi M. Khoshgoftaar
2019 Journal of Big Data  
deep learning techniques for addressing class imbalanced data.  ...  Finally, the rise of big data analytics and its challenges are introduced along with a discussion on the role of deep learning in solving these challenges.  ...  Acknowledgements The authors would like to thank the anonymous reviewers for their constructive evaluation of this paper, and the various members of the Data Mining and Machine Learning Laboratory, Florida  ... 
doi:10.1186/s40537-019-0192-5 fatcat:dor65fgn7ffhxmqqv3mkold6wq

JPPRED: Prediction of Types of J-Proteins from Imbalanced Data Using an Ensemble Learning Method

Lina Zhang, Chengjin Zhang, Rui Gao, Runtao Yang
2015 BioMed Research International  
To deal with the imbalanced benchmark dataset, the synthetic minority oversampling technique (SMOTE) and undersampling technique are applied.  ...  It is anticipated that JPPRED can be a potential candidate for J-protein prediction.  ...  The authors also would like to thank HSPIR, CD-HIT, and PSI-BLAST for supplying related data applied in this study.  ... 
doi:10.1155/2015/705156 pmid:26587542 pmcid:PMC4637456 fatcat:ufcgxbad5zfj7axkek26tabo3e

Performance analysis of cost-sensitive learning methods with application to imbalanced medical data

Ibomoiye Domor Mienye, Yanxia Sun
2021 Informatics in Medicine Unlocked  
This research aims to provide a general overview of the imbalanced classification problem and ML algorithms suitable for such classification problems focusing on medical data.  ...  Cost-sensitive learning aims to minimize the misclassification cost of a model on the input data.  ...  J o u r n a l P r e -p r o o f Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence  ... 
doi:10.1016/j.imu.2021.100690 fatcat:fbxsyvzv2zdvfcn6mina5nxdme

A Compact Evolutionary Interval-Valued Fuzzy Rule-Based Classification System for the Modeling and Prediction of Real-World Financial Applications With Imbalanced Data

Jose Antonio Sanz, Dario Bernardo, Francisco Herrera, Humberto Bustince, Hani Hagras
2015 IEEE transactions on fuzzy systems  
Most classifiers designed for minimizing the global error rate perform poorly on imbalanced datasets because they misclassify most of the data belonging to the class represented by few examples [1] ,  ...  In this paper, we will present a compact evolutionary IV-FRBCS based on IVTURSFARC-HD for the modeling and prediction of financial applications with imbalanced data, with the aim of providing an accurate  ... 
doi:10.1109/tfuzz.2014.2336263 fatcat:o3svt7k6ovc3lejg6a66qk6p6m

Ensemble of Cost-Sensitive Hypernetworks for Class-Imbalance Learning

Jin Wang, Ping-li Huang, Kai-wei Sun, Bao-lin Cao, Rui Zhao
2013 2013 IEEE International Conference on Systems, Man, and Cybernetics  
We propose a new learning algorithm based on the idea of NCL, named AdaBoost.NC, for classification problems.  ...  Class imbalance learning refers to learning from imbalanced data sets, in which some classes of examples (minority) are highly under-represented comparing to other classes (majority).  ...  One attempts to measure the similarity for a new input to the learnt class, so as to judge whether it belongs to this class.  ... 
doi:10.1109/smc.2013.324 dblp:conf/smc/WangHSCZ13 fatcat:kniqcwmozbh2ni53tpgwtsfa4e

Detecting the Risk of Customer Churn in Telecom Sector: A Comparative Study

Nabahirwa Edwine, Wenjuan Wang, Wei Song, Denis Ssebuggwawo, Melih Yucesan
2022 Mathematical Problems in Engineering  
The experimental results have revealed that the RF algorithm optimized by Grid Search based on a low-ratio undersampling strategy (RF-GS-LR) outperformed other models in extracting hidden information and  ...  Identifying churn-risk customers is essential for telecom sectors to retain old customers and maintain a higher competitive advantage.  ...  GA is a metaheuristic algorithm based on evolutionary theory.  ... 
doi:10.1155/2022/8534739 fatcat:y555mtghrnfdpct3blwb6i4bvy
« Previous Showing results 1 — 15 out of 96 results