Multiple Relevant Feature Ensemble Selection Based on Multilayer Co-Evolutionary Consensus MapReduce

Weiping Ding, Chin-Teng Lin, Witold Pedrycz
2018 IEEE Transactions on Cybernetics  
Although feature selection for large data has been intensively investigated in data mining, machine learning, and pattern recognition, the challenges are not just to invent new algorithms to handle noisy and uncertain large data in appli-cations, but rather to link the multiple relevant feature sources, structured, or unstructured, to develop an effective feature reduc-tion method. In this paper, we propose a multiple relevant feature Index Terms-Cerebral cortex classification, co-evolutionary
more » ... onsensus MapReduce, consistency aggregation, multiple relevant feature selection, Nash equilibrium. I. INTRODUCTION ensemble selection (MRFES) algorithm based on multilayer co-N RECENT years, massive amounts of data have become evolutionary consensus MapReduce (MCCM). We construct an Iavailable for all kinds of industrial applications, and big effective MCCM model to handle feature ensemble selection of large-scale datasets with multiple relevant feature sources, data has emerged as an important research topic and a visible application domain. Big data analytics can definitely and explore the unified consistency aggregation between the local solutions and global dominance solutions achieved by the reveal valuable knowledge. Reflecting the very nature of co-evolutionary memeplexes, which participate in the coopera-the data in big data, we often refer to so-called the "5V" tive feature ensemble selection process. This model attempts to aspect: 1) volume; 2) variety; 3) velocity; 4) veracity; and reach a mutual decision agreement among co-evolutionary meme-5) value [1]. It is critical to extract knowledge and build modplexes, which calls for the need for mechanisms to detect some els from big data. But the real challenge comes with the noncooperative co-evolutionary behaviors and achieve better Nash equilibrium resolutions. Extensive experimental compar-requisition of such knowledge, which is quantitative, defined ative studies substantiate the effectiveness of MRFES to solve across multiple space-time scales, and capable of prediclarge-scale dataset problems with the complex noise and multiple tion with sufficient accuracy [2]. It poses evident demands relevant feature sources on some well-known benchmark datasets. on conventional methods currently used in data mining and The algorithm can greatly facilitate the selection of relevant machine learning, including transmission, storage, processfeature subsets coming from the original feature space with better accuracy, efficiency, and interpretability. Moreover, we apply ing, and optimization [3], [4]. Furthermore, many big datasets MRFES to human cerebral cortex-based classification prediction. increase dynamically in size and contain various elements of Such successful applications are expected to significantly scale up noise. Many features are likely redundant or irrelevant. These classification prediction for large-scale and complex brain data in useless features often diminish the learning process associterms of efficiency and feasibility. ated with classification algorithms, decreasing their overall performance. Various real-world big data applications can be formulated as feature selection problems. Hence, to analyze the high-dimensional datasets with huge numbers of features
doi:10.1109/tcyb.2018.2859342 pmid:30130243 fatcat:lk7dgvlfhjcj3exex3ddlgss7q