Scaling Genetic Programming for data classification using MapReduce methodology

Nailah Al-Madi, Simone A. Ludwig
2013 2013 World Congress on Nature and Biologically Inspired Computing  
Genetic Programming (GP) is an optimization method that has proved to achieve good results. It solves problems by generating programs and applying natural operations on these programs until a good solution is found. GP has been used to solve many classifications problems, however, its drawback is the long execution time. When GP is applied on the classification task, the execution time proportionally increases with the dataset size. Therefore, to manage the long execution time, the GP algorithm
more » ... is parallelized in order to speed up the classification process. Our GP is implemented based on the MapReduce methodology (abbreviated as MRGP), in order to benefit from the MapReduce concept in terms of fault tolerance, load balancing, and data locality. MRGP does not only accelerate the execution time of GP for large datasets, it also provides the ability to use large population sizes, thus finding the best result in fewer numbers of generations. MRGP is evaluated using different population sizes ranging from 1,000 to 100,000 measuring the accuracy, scalability, and speedup.
doi:10.1109/nabic.2013.6617851 dblp:conf/nabic/Al-MadiL13 fatcat:yaiutgclsfeajivcwpf52rkkxi