Pan-cancer subtyping in a 2D-map shows substructures that are driven by specific combinations of molecular characteristics

Erdogan Taskesen, Sjoerd M. H. Huisman, Ahmed Mahfouz, Jesse H. Krijthe, Jeroen de Ridder, Anja van de Stolpe, Erik van den Akker, Wim Verheagh, Marcel J. T. Reinders
2016 Scientific Reports  
The use of genome-wide data in cancer research, for the identification of groups of patients with similar molecular characteristics, has become a standard approach for applications in therapy-response, prognosis-prediction, and drug-development. To progress in these applications, the trend is to move from single genome-wide measurements in a single cancer-type towards measuring several different molecular characteristics across multiple cancer-types. Although current approaches shed light on
more » ... ecular characteristics of various cancer-types, detailed relationships between patients within cancer clusters are unclear. We propose a novel multi-omic integration approach that exploits the joint behavior of the different molecular characteristics, supports visual exploration of the data by a twodimensional landscape, and inspection of the contribution of the different genome-wide data-types. We integrated 4,434 samples across 19 cancer-types, derived from TCGA, containing gene expression, DNA-methylation, copy-number variation and microRNA expression data. Cluster analysis revealed 18 clusters, where three clusters showed a complex collection of cancer-types, squamous-cell-carcinoma, colorectal cancers, and a novel grouping of kidney-cancers. Sixty-four samples were identified outside their tissue-of-origin cluster. Known and novel patient subgroups were detected for Acute Myeloid Leukemia's, and breast cancers. Quantification of the contributions of the different molecular types showed that substructures are driven by specific (combinations of) molecular characteristics. With rapidly increasing availability of novel therapeutic options, like targeted therapies for tumor-driving signal transduction pathways and revolutionary immunotherapies, there is an urgent clinical need to match these therapies to specific groups of patients, in order to maximize patient benefit. Conventionally, cancer subtyping, prognosis assessment, and therapy choice for cancer patients are based on standard histopathology, such as pathological stainings for KI67, ER, PR and Her2 in the case of breast cancer 1 , or identifying EGFR, BRAF, and KRAS mutations in colorectal or lung cancer 2 . High throughput technologies, such as microarrays and next-generation sequencing, have opened new possibilities for biomarker discovery and cancer subtyping, by moving from single gene analysis to an analysis encompassing the whole genome and/or transcriptome 3,4 . For instance, transcriptional breast cancer signatures have been associated with clinical outcome 5 . Remarkably, patient groups identified using either genomic mutations or expression signatures often show poor concordance. This is for example apparent in Acute Myeloid Leukemias where the largest group of patients have normal karyotypes with point mutations (e.g, FLT3ITD, NPM1, IDH1/IDH2 or KRAS/NRAS) and do not cluster on mutation status using gene expression profiles 6 . A lack of cluster robustness across different molecular data-types complicates treatment choice. Hence, there is a need for integrative analyses of genome-wide datasets across different molecular data-types to reach a unified and more robust cancer subtyping.
doi:10.1038/srep24949 pmid:27109935 pmcid:PMC4842960 fatcat:c43ncw2dvrcq7cib4odshj4tyi