CGMDA: An approach to predict and validate MicroRNA-disease associations by utilizing Chaos game Representation and LightGBM

Kai Zheng, Lei Wang, Zhu-Hong You
2019 IEEE Access  
Recent studies have shown that microRNAs (miRNAs) play an important role in complex human diseases. Identifying potential miRNA-disease associations is useful for understanding the pathogenesis. However, there are currently only a few methods proposed to predict miRNA-disease association based on sequence information. And these methods can only quantify nonlinear sequence relationships without taking linear sequence information into account. In this work, we designed a computational method for
more » ... redicting miRNA-disease association based on chaos game representation, called CGMDA, to overcome these problems. CGMDA combines association information with miRNA sequence information, miRNA functional information and disease semantic information to improve prediction accuracy. In particular, we use chaos game representation (CGR) technology for the first time to transform miRNA sequence information into image information and extract its features. In the cross-validation experiment, CGMDA achieved a mean the area under the receiver operating characteristic curve (AUC) of 0.9099 on the HMDD v3.0 data set. To better evaluate the performance of CGMDA, we compared it to different classifiers and related prediction methods. In addition, CGMDA is applied to three human complex diseases. The results showed that of the top 40 disease-related miRNAs predicted, 39 (Breast Neoplasm), 39 (Lymphoma) and 38 (Colon Neoplasm) were validated by experiments in case studies. These experimental results show that CGMDA is a reliable tool and has potential application prospects in assisting early diagnosis and treatment of prognosis. INDEX TERMS miRNAs, chaos game representation, disease, heterogenous information, LightGBM. 133314 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
doi:10.1109/access.2019.2940470 fatcat:cjeremcxuvhbhh6i6ps6nhr4gy