Genetic heterogeneity analysis using genetic algorithm and network science

Zhendong Sha, Yuanzhu Chen, Ting Hu
2022 Proceedings of the Genetic and Evolutionary Computation Conference Companion  
Genome-wide association studies (GWAS) have linked thousands of genetic variants to the susceptibility of many common human diseases. However, the genetic explanations of diseases are often heterogeneous, imposing a substantial challenge for GWAS. We propose a feature construction method using genetic algorithm (GA) to recognize the heterogeneous risk effects of different genetic variable groups. Multiple GA-based feature selection runs are used to collect an ensemble of the high-performing
more » ... ure subsets. We generate a feature co-selection network from the ensemble, where nodes represent genetic variables and edges represent their co-selection frequencies. A new synthetic feature, namely community risk score (CRS), is created for each network community. CRS quantifies the risk of a community of variables and allows for more effective heterogeneity analysis. We applied our method to two colorectal cancer GWAS datasets, one for training and the other for validation. We ran the GA-based feature selection on the training dataset and constructed the co-selection network. CRS was then created for each community in the network. We identified three colorectal cancer subtypes using the CRSs and clustering algorithms on the validation dataset. The function enrichment analysis in our results further highlighted gastric cancer related genes, tumor suppressors and DNA methylation genes.
doi:10.1145/3520304.3529027 fatcat:5s2ln5mhhvbhbkm75x2lriyxtq