Accurate prognosis for localized prostate cancer through coherent voting networks and multi-omic data [article]

MARCO Pellegrini
2022 medRxiv   pre-print
Prostate cancer is a very heterogeneous disease, from both a clinical and a biological/biochemical point of view, which makes the task of producing a stratification of patients into risk classes remarkably challenging. In particular, it is important an early detection and discrimination of the indolent forms of the disease, from the aggressive ones, requiring closer surveillance and timely treatment decisions. Methods: We extend a recently developed supervised machine learning (ML) technique,
more » ... lled coherent voting networks (CVN) by incorporating novel model-selection technique to counter model overfitting. The CVN method is then applied to the problem of predicting an accurate prognosis (with a time granularity of 1 year) for patients affected by prostate cancer. The CVN is developed on a discovery cohort of 495 patients from the TCGA-PRAD collection, and validated on several other independent cohorts, comprising in total of 744 patients. Findings: We uncover seven multi-gene fingerprints, each comprising six to seven genes, that correspond to different input data types (mRNA expression, proteomic assays, or methylation) and different time points, for the event of progression-free survival (PFS) in patients diagnosed with prostate adenocarcinoma, who had not received prior treatment for their disease. On the test set for the discovery cohort, we attain Odds Ratios ranging from a minimum of 12.0 and a maximum of 21.0, with average 16.8, and geometric mean p-value 0.01; Cohen kappa values ranging from a minim of 0.29 to a maximum of 0.59, with average 0.47; and AUC ranging from a minimum of 0.62 to a maximum of 0.79, with average 0.72, with geometric mean p-value 0.01; significant (< 0.05) p-values for the log-rank tests are found in six cases, with geometric mean p-value 0.0006. On seven independent cohorts for 21 combinations of cohort vs fingerprint, we report Odds Ratios ranging from a minimum of 9.0 and a maximum of 40.0, with average 17.5, geometric mean p-value 0.003; Cohen kappa values ranging from a minimum of 0.18 to a maximum of 0.65, with average 0.4; and AUC ranging from a minimum of 0.61 to a maximum of 0.88, with average 0.76, geometric mean p-value 0.001. Many of the genes in our fingerprint have recorded prognostic power in some form of cancer, and have been studied for their functional roles in cancer on animal models or cell lines. Interpretation: The development of novel ML techniques tailored to the problem of uncovering effective multi-gene prognostic biomarkers is a promising new line of attack for sharpening our capability to diversify and personalize cancer patient treatments. For the challenging problem of discriminating between indolent and aggressive types of non-metastatic prostate cancer, we show that it is possible to attain accurate prognostic prediction with a granularity within a year, which is an improvement beyond the current state of the art.
doi:10.1101/2022.07.28.22278156 fatcat:davv2sli7fbjhmhz7ewxjistle