Streaming feature selection using alpha-investing

Jing Zhou, Dean Foster, Robert Stine, Lyle Ungar
2005 Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05  
In Streaming Feature Selection (SFS), new features are sequentially considered for addition to a predictive model. When the space of potential features is large, SFS offers many advantages over traditional feature selection methods, which assume that all features are known in advance. Features can be generated dynamically, focusing the search for new features on promising subspaces, and overfitting can be controlled by dynamically adjusting the threshold for adding features to the model. We
more » ... ribe α-investing, an adaptive complexity penalty method for SFS which dynamically adjusts the threshold on the error reduction required for adding a new feature. α-investing gives false discovery ratestyle guarantees against overfitting. It differs from standard penalty methods such as AIC, BIC or RIC, which always drastically over-or under-fit in the limit of infinite numbers of non-predictive features. Empirical results show that SFS is competitive with much more compute-intensive feature selection methods such as stepwise regression, and allows feature selection on problems with over a million potential features.
doi:10.1145/1081870.1081914 dblp:conf/kdd/ZhouFSU05 fatcat:ah63pvlfijcbrjgybx43hukpae