A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Random Forests, Decision Trees, and Categorical Predictors: The "Absent Levels" Problem
[article]
2018
arXiv
pre-print
One advantage of decision tree based methods like random forests is their ability to natively handle categorical predictors without having to first transform them (e.g., by using feature engineering techniques). However, in this paper, we show how this capability can lead to an inherent "absent levels" problem for decision tree based methods that has never been thoroughly discussed, and whose consequences have never been carefully explored. This problem occurs whenever there is an indeterminacy
arXiv:1706.03492v2
fatcat:3fcncob2xvdsnncnk2wer3rxmi