The classification of cancer based on DNA microarray data that uses diverse ensemble genetic programming

Jin-Hyuk Hong, Sung-Bae Cho
2006 Artificial Intelligence in Medicine  
Object: The classification of cancer based on gene expression data is one of the most important procedures in bioinformatics. In order to obtain highly accurate results, ensemble approaches have been applied when classifying DNA microarray data. Diversity is very important in these ensemble approaches, but it is difficult to apply conventional diversity measures when there are only a few training samples available. Key issues that need to be addressed under such circumstances are the
more » ... of a new ensemble approach that can enhance the successful classification of these datasets. Materials and methods: An effective ensemble approach that does use diversity in genetic programming is proposed. This diversity is measured by comparing the structure of the classification rules instead of output-based diversity estimating. Results: Experiments performed on common gene expression datasets (such as lymphoma cancer dataset, lung cancer dataset and ovarian cancer dataset) demonstrate the performance of the proposed method in relation to the conventional approaches. Conclusion: Diversity measured by comparing the structure of the classification rules obtained by genetic programming is useful to improve the performance of the ensemble classifier. # Definition 6 (An ensemble hypothesis). EH = {eh j eh(x i ) = Majority_vote(t 1 (x i ), . . ., t l (x i )), where t j 2 T, l is the ensemble size}. Definition 7 (Volume of version space V 0 ). Vol(t) = the size of version space V 0 that satisfies t. Definition 8 (Accuracy of t). Accðx i Þ ¼ Volðy i t j ðx i Þ > 0Þ Volðy i t j ðx i Þ > 0Þ þ Volðy i t j ðx i Þ < 0Þ :
doi:10.1016/j.artmed.2005.06.002 pmid:16102956 fatcat:n2v6norbsfbkneaydxaymwgbou