A novel machine learning approach reveals latent vascular phenotypes predictive of renal cancer outcome

Nathan Ing, Fangjin Huang, Andrew Conley, Sungyong You, Zhaoxuan Ma, Sergey Klimov, Chisato Ohe, Xiaopu Yuan, Mahul B. Amin, Robert Figlin, Arkadiusz Gertych, Beatrice S. Knudsen
2017 Scientific Reports  
Gene expression signatures are commonly used as predictive biomarkers, but do not capture structural features within the tissue architecture. Here we apply a 2-step machine learning framework for quantitative imaging of tumor vasculature to derive a spatially informed, prognostic gene signature. The trained algorithms classify endothelial cells and generate a vascular area mask (VAM) in H&E micrographs of clear cell renal cell carcinoma (ccRCC) cases from The Cancer Genome Atlas (TCGA).
more » ... cation of VAMs led to the discovery of 9 vascular features (9VF) that predicted disease-freesurvival in a discovery cohort (n = 64, HR = 2.3). Correlation analysis and information gain identified a 14 gene expression signature related to the 9VF's. Two generalized linear models with elastic net regularization (14VF and 14GT), based on the 14 genes, separated independent cohorts of up to 301 cases into good and poor disease-free survival groups (14VF HR = 2.4, 14GT HR = 3.33). For the first time, we successfully applied digital image analysis and targeted machine learning to develop prognostic, morphology-based, gene expression signatures from the vascular architecture. This novel morphogenomic approach has the potential to improve previous methods for biomarker development. Analytical strategies involving machine learning have recently been applied to biomarker discovery in digital pathology images. One machine learning approach, broadly called Deep Learning, involves automatic recognition of cancerous tissue with millions of discrete parameters, hindering reasonable interpretation of discriminative features. In another branch of machine learning, algorithms are developed to recognize specific predefined features in images and measure their abundance. The latter approach permits algorithm design informed by observations related to biological concepts, making it the preferred strategy for analyzing multicellular biological processes 1-15 . Important prognostic information has been obtained from analysis of RNA expression data through measurements of pathways that drive cell intrinsic biological mechanisms such as transcription factor activity, stemness, epithelial-to-mesenchymal transition or neuronal differentiation 16 . Unfortunately, this approach is confounded by the averaging of signals across heterogeneous cell types and across the ternary spatial organization of higher order structures from which the RNA is obtained. These spatial relationships are indispensable to diagnostic interpretation by pathologists but are difficult to quantify without computational assistance. Recent computational and machine learning tools provide new opportunities to quantify the cellular composition and spatial organization of the tumor and its microenvironment (TME) 2,13 . Despite the importance of angiogenesis in the TME for tumor growth and aggressiveness, the tumor vasculature has been incompletely represented by both image analysis and gene expression analysis. Since the tumor vasculature is a highly orchestrated network of branched tubular structures, it is useful as a model system to determine how higher order cellular structures may be captured through linking quantitative imaging with genomic data. In clear cell Renal Cell Carcinoma (ccRCC), the most common subtype of renal cell carcinomas 17 , excessive angiogenesis constitutes a pathognomonic diagnostic feature. It is caused by the loss of the Von Hippel Lindau tumor suppressor protein, VHL, which results in secretion of vascular endothelial growth factor (VEGF) 18 .
doi:10.1038/s41598-017-13196-4 pmid:29038551 pmcid:PMC5643431 fatcat:oo3lrxzp2fdnnheetbkwf3vicq