Rule extraction using genetic programming for accurate sales forecasting

Rikard Konig, Ulf Johansson
2014 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)  
The purpose of this paper is to propose and evaluate a method for reducing the inherent tendency of genetic programming to overfit small and noisy data sets. In addition, the use of different optimization criteria for symbolic regression is demonstrated. The key idea is to reduce the risk of overfitting noise in the training data by introducing an intermediate predictive model in the process. More specifically, instead of directly evolving a genetic regression model based on labeled training
more » ... a, the first step is to generate a highly accurate ensemble model. Since ensembles are very robust, the resulting predictions will contain less noise than the original data set. In the second step, an interpretable model is evolved, using the ensemble predictions, instead of the true labels, as the target variable. Experiments on 175 sales forecasting data sets, from one of Sweden's largest wholesale companies, show that the proposed technique obtained significantly better predictive performance, compared to both straightforward use of genetic programming and the standard M5P technique. Naturally, the level of improvement depends critically on the performance of the intermediate ensemble.
doi:10.1109/cidm.2014.7008669 dblp:conf/cidm/KonigJ14 fatcat:lrn23vyzjnen7gjvs7irop347u