MDFS: MultiDimensional Feature Selection in R

Radosław Piliszek, Krzysztof Mnich, Szymon Migacz, Paweł Tabaszewski, Andrzej Sułecki, Aneta Polewko-Klim, Witold Rudnicki
2019 The R Journal  
Identification of informative variables in an information system is often performed using simple one-dimensional filtering procedures that discard information about interactions between variables. Such an approach may result in removing some relevant variables from consideration. Here we present an R package MDFS (MultiDimensional Feature Selection) that performs identification of informative variables taking into account synergistic interactions between multiple descriptors and the decision
more » ... and the decision variable. MDFS is an implementation of an algorithm based on information theory (Mnich and Rudnicki, 2017). The computational kernel of the package is implemented in C++. A high-performance version implemented in CUDA C is also available. The application of MDFS is demonstrated using the well-known Madelon dataset, in which a decision variable is generated from synergistic interactions between descriptor variables. It is shown that the application of multidimensional analysis results in better sensitivity and ranking of importance. Theory Kohavi and John proposed that a variable x i ∈ X, where X is a set of all descriptive variables, is weakly relevant if there exists a subset of variables X sub ⊂ X : x i / ∈ X sub that one can increase information on
doi:10.32614/rj-2019-019 fatcat:vpmn7cb7jfcijjao45mtubmz6m