59,234 Hits in 5.1 sec

Research on Optimization of Random Forest Algorithm Based on Spark

Suzhen Wang, Zhanfeng Zhang, Shanshan Geng, Chaoyi Pang
2022 Computers Materials & Continua  
This improved random forest algorithm performs feature extraction according to the calculated feature importance to form a feature subspace.  ...  However, the random forest algorithm uses a simple random sampling feature selection method when generating feature subspaces which cannot distinguish redundant features, thereby affecting its classification  ...  that there are Sub features in the feature subspace, then the strong correlation feature in the feature subspace number NumNS is: Num NS = Sub • S NS (8) Among them, S NS is the proportion of the importance  ... 
doi:10.32604/cmc.2022.015378 fatcat:yfm4i6t3q5e4pey6sj5ragunva

Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data

Thanh-Tung Nguyen, Joshua Zhexue Huang, Thuy Thi Nguyen
2015 The Scientific World Journal  
However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting.  ...  This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees.  ...  a random forest uses in-bag samples to produce a kind of importance measure, called an inbag importance score.  ... 
doi:10.1155/2015/471371 pmid:25879059 pmcid:PMC4387916 fatcat:srvuskevzbchtoiolzw2pkeulq

Online sketching for big data subspace learning

Morteza Mardani, Georgios B. Giannakis
2015 2015 23rd European Signal Processing Conference (EUSIPCO)  
Leveraging the online subspace updates, we introduce a notion of importance score, which is subsequently adapted into a randomization scheme to predict a minimal subset of important features to acquire  ...  in the next time instant.  ...  For a prescribed maximum sample count , one can then draw random trials from the distribution to collect the important features in the set Ω .  ... 
doi:10.1109/eusipco.2015.7362837 dblp:conf/eusipco/MardaniG15 fatcat:mt2plztsvfebrajmtaornol2ou

A Novel Random Subspace Method for Online Writeprint Identification

Zhi Liu, Zongkai Yang, Sanya Liu, Wenting Meng
2012 Journal of Computers  
In this paper, we proposed a novel random subspace method by constructing a set of stable classifiers to take advantage of nearly all the discriminative information in the high dimensional feature space  ...  random subspace methods.  ...  ACKNOWLEDGMENT This work was supported by the National Key Technology R&D Program in the 12th Five-Year Plan (Grant No. 2011BAK08B03, 2011BAK08B05), Program for New Century Excellent Talents in University  ... 
doi:10.4304/jcp.7.12.2997-3004 fatcat:iccamncnb5hfrol7xeq5gjo64m

Bagging and the Random Subspace Method for Redundant Feature Spaces [chapter]

Marina Skurichina, Robert P. W. Duin
2001 Lecture Notes in Computer Science  
In this paper, on the example of the pseudo Fisher linear classifier, we study the effect of the redundancy in the data feature set on the performance of the random subspace method and bagging. b  ...  The performance of a single weak classifier can be improved by using combining techniques such as bagging, boosting and the random subspace method.  ...  In order to construct good classifiers in random subspaces, it is important that each subspace would contain as much as possible useful information.  ... 
doi:10.1007/3-540-48219-9_1 fatcat:nbj72qlwgbf4bjojwnq2azxl6y

Semi-supervised Text Categorization by Considering Sufficiency and Diversity [chapter]

Shoushan Li, Sophia Yat Mei Lee, Wei Gao, Chu-Ren Huang
2013 Communications in Computer and Information Science  
Moreover, we further improve the random feature subspace-based bootstrapping with some constraints on the subspace generation to better satisfy the diversity preference.  ...  After carefully considering the diversity preference, we modify the traditional bootstrapping algorithm by training the involved classifiers with random feature subspaces instead of the whole feature space  ...  Bootstrapping algorithm with random subspace classifiers The size of the feature subset r is an important parameter in this algorithm.  ... 
doi:10.1007/978-3-642-41644-6_11 fatcat:ovcg5rk6inartgessrv6ccon7q

Weighted random subspace method for high dimensional data classification

Xiaoye Li, Hongyu Zhao
2009 Statistics and its Interface  
The aggregating algorithms, e.g. the bagging predictor, the boosting algorithm, the random subspace method, and the Random Forests algorithm, are promising in handling high dimensional data.  ...  We have applied the proposed weight assignment procedures to the random subspace method to develop a weighted random subspace method.  ...  ACKNOWLEDGEMENTS This work was supported in part from NHLB/NIH contract N01-HV-28186, NIDA/NIH grant P30 DA 018343-01, and NIGMS grant R01 GM 59507. Received 29 September 2008  ... 
doi:10.4310/sii.2009.v2.n2.a5 pmid:21918713 pmcid:PMC3170928 fatcat:a65wx6f3one7xcbtmn6i4gwyey

Super RaSE: Super Random Subspace Ensemble Classification

Jianan Zhu, Yang Feng
2021 Journal of Risk and Financial Management  
In this work, we show that Super RaSE avoids the need to choose a base classifier by randomly sampling a collection of classifiers together with the subspace.  ...  We propose a new ensemble classification algorithm, named super random subspace ensemble (Super RaSE), to tackle the sparse classification problem.  ...  In addition, the increase in sample size leads to the selection of all important features with almost 100% percentage.  ... 
doi:10.3390/jrfm14120612 fatcat:m5bjw6hihzawnnpj7yw5eosiqa

Stratified sampling for feature subspace selection in random forests for high dimensional data

Yunming Ye, Qingyao Wu, Joshua Zhexue Huang, Michael K. Ng, Xutao Li
2013 Pattern Recognition  
Random forest algorithms tend to use a simple random sampling of features in building their decision trees and consequently select many subspaces that contain few, if any, informative features.  ...  In this paper we propose a stratified sampling method to select the feature subspaces for random forests with high dimensional data. The key idea is to stratify features into two groups.  ...  In Section 2, we review random forests. In Section 3, we present the stratified sampling method for feature subspace selection.  ... 
doi:10.1016/j.patcog.2012.09.005 fatcat:6qdqqjigqzfsjk6ucj22nigphy

Random Sampling for Subspace Face Recognition

Xiaogang Wang, Xiaoou Tang
2006 International Journal of Computer Vision  
Instead of pursuing a single optimal subspace, we develop an ensemble learning framework based on random sampling on all three key components of a classification system: the feature space, training samples  ...  In addition, we further apply random sampling on parameter selection in order to overcome the difficulty of selecting optimal parameters in our algorithms.  ...  Acknowledgements The work described in this paper was fully supported by grants from the Research Grants Council of the Hong Kong Special Administrative Region and a joint grant (N CUHK409/03) from HKSAR  ... 
doi:10.1007/s11263-006-8098-z fatcat:kburavsjn5gfddyrytcol7jqce

An Improved Random Forest Classifier for Text Categorization

Baoxun Xu, Xiufeng Guo, Yunming Ye, Jiefeng Cheng
2012 Journal of Computers  
With the new feature weighting method for subspace sampling and tree selection method, we can effectively reduce subspace size and improve classification performance without increasing error bound.  ...  The results have demonstrated that this improved random forests outperformed the popular text classification methods in terms of classification performance.  ...  In the future work we will test other feature weighting methods for optimizing the random sampling subspace used in random forest. .  ... 
doi:10.4304/jcp.7.12.2913-2920 fatcat:p3z4ml3zlfcujfbd6sglgskbei

Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

Baoxun Xu, Joshua Zhexue Huang, Graham Williams, Qiang Wang, Yunming Ye
2012 International Journal of Data Warehousing and Mining  
Using a simple random sampling results in informative features not being included in subspaces (Amaratunga, Cabrera, & Lee, 2008) .  ...  To build decision trees with improved performance it is important to select subspaces containing more informative features.  ... 
doi:10.4018/jdwm.2012040103 fatcat:dotaknxqunbujdxoedeb4nl4uu

Multi-feature canonical correlation analysis for face photo-sketch image retrieval

Dihong Gong, Zhifeng Li, Jianzhuang Liu, Yu Qiao
2013 Proceedings of the 21st ACM international conference on Multimedia - MM '13  
The MCCA is an extension and improvement of the canonical correlation analysis (CCA) algorithm using multiple features combined with two different random sampling methods in feature space and sample space  ...  Automatic face photo-sketch image retrieval has attracted great attention in recent years due to its important applications in real life.  ...  To solve these problems, we apply two popular random sampling methods: random subspace [17] and bagging [18] .  ... 
doi:10.1145/2502081.2502162 dblp:conf/mm/GongLLQ13 fatcat:565c2ujxxffkjl5or3ourmdyk4

Generating Diverse Ensembles to Counter the Problem of Class Imbalance [chapter]

T. Ryan Hoens, Nitesh V. Chawla
2010 Lecture Notes in Computer Science  
In this paper we propose an ensemble framework that combines random subspaces with sampling to overcome the class imbalance problem.  ...  In order to combat this, many techniques have been proposed, especially centered around sampling methods.  ...  Acknowledgements Work was supported in part by the NSF Grant ECCS-0926170 and the Notebaert Premier Fellowship.  ... 
doi:10.1007/978-3-642-13672-6_46 fatcat:vc5o5f6bend6pauz7yxjuoiwwy

Random Subspace Learning (RASSEL) with data driven weighting schemes

Mohamed Elshrif, Ernest Fokoué
2018 Mathematics for applications  
We present a novel adaptation of the random subspace learning approach to regression analysis and classification of high dimension low sample size data, in which the use of the individual strength of each  ...  The adaptation of random subspace learning presented in this paper differs from random forest in the following ways: (a) instead of using trees as RF does, we use multiple linear regression (MLR) as our  ...  Some authors before use, like [23] , in their recent work stratified sampling for feature subspace selection in random forests for high dimensional data, have weighted the trees comprising the random  ... 
doi:10.13164/ma.2018.02 fatcat:s7o5lziyejc6tk4m2fyptzjx5a
« Previous Showing results 1 — 15 out of 59,234 results