Peer Review #1 of "Gene signatures with predictive and prognostic survival values in human osteosarcoma (v0.1)" [peer_review]

YH Kim
2021 unpublished
Osteosarcoma is a common malignancy seen mainly in children and adolescents. The disease is characterized by poor overall prognosis and lower survival due to a lack of predictive markers. Many gene signatures with diagnostic, prognostic, and predictive values were evaluated to achieve better clinical outcomes. Two public data series, GSE21257 and UCSC Xena, were used to identify the minimum number of robust genes needed for a predictive signature to guide prognosis of patients with
more » ... The lasso regression algorithm was used to analyze sequencing data from TCGA-TARGET, and methods such as Cox regression analysis, risk factor scoring, receiving operating curve, KMplot prognosis analysis, and nomogram were used to characterize the prognostic predictive power of the identified genes. Their utility was assessed using the GEO osteosarcoma dataset. Finally, the functional enrichment analysis of the identified genes was performed. A total of twenty-gene signatures were found to have a good prognostic value for predicting patient survival. Gene ontology analysis showed that the key genes related to osteosarcoma were categorized as peptide-antigen binding, clathrin-coated endocytic vesicle membrane, peptide binding, and MHC class II protein complex. The osteosarcoma related genes in these modules were significantly enriched in the processes of antigen processing and presentation, phagocytosis, cell adhesion molecules, Staphylococcus aureus infection. Twenty gene signatures were identified related to osteosarcoma, which would be helpful for predicting prognosis of patients with OS. Further, these signatures can be used to determine the subtypes of osteosarcoma. Abstract 30 Osteosarcoma is a common malignancy seen mainly in children and adolescents. The disease is 31 characterized by poor overall prognosis and lower survival due to a lack of predictive markers. 32 Many gene signatures with diagnostic, prognostic, and predictive values were evaluated to 33 achieve better clinical outcomes. Two public data series, GSE21257 and UCSC Xena, were used 34 to identify the minimum number of robust genes needed for a predictive signature to guide 35 prognosis of patients with osteosarcoma. The lasso regression algorithm was used to analyze 36 sequencing data from TCGA-TARGET, and methods such as Cox regression analysis, risk factor 37 scoring, receiving operating curve, KMplot prognosis analysis, and nomogram were used to 38 characterize the prognostic predictive power of the identified genes. Their utility was assessed 39 using the GEO osteosarcoma dataset. Finally, the functional enrichment analysis of the identified 40 genes was performed. A total of twenty-gene signatures were found to have a good prognostic 41 value for predicting patient survival. Gene ontology analysis showed that the key genes related to 42 osteosarcoma were categorized as peptide-antigen binding, clathrin-coated endocytic vesicle 43 membrane, peptide binding, and MHC class II protein complex. The osteosarcoma related genes 44 in these modules were significantly enriched in the processes of antigen processing and 45 presentation, phagocytosis, cell adhesion molecules, Staphylococcus aureus infection. Twenty 46 gene signatures were identified related to osteosarcoma, which would be helpful for predicting 47 prognosis of patients with OS. Further, these signatures can be used to determine the subtypes of 48 osteosarcoma. 49 50 1 PeerJ reviewing PDF | (2020:07:50718:1:2:NEW 19 Nov 2020) Manuscript to be reviewed 71 establishment of treatment goals. Serum biomarkers are used for predicting prognosis of other 72 cancers but are rarely characterized in OS(Zamborsky et al., 2019 ). An obvious need in OS is 73 effective biomarkers for characterizing disease progression and associated prognosis. 74 One study reported that CDC20 and its downstream substrates, secure, cyclin A2 and cyclin B2 75 are good prognostic factors for OS . Savage et al (Jin et al., 2007) . suggested that 76 two loci in the GRM4 gene at 6p21.3 and in the gene, desert, at 2p25.2, These two loci warrant 77 further exploration to uncover the biological mechanisms underlying susceptibility to 78 osteosarcoma.The study addressed a single gene and did not take into account interactions 79 among molecules that regulate tumorigenesis. Three candidate genes (ALOX5AP, CD74 and 80 FCGR2A) were found. Their expression levels in lung and lymph nodes were higher than levels 81 in matched cancer tissues, and they may be expressed in microenvironments (Li et al., 2020) . 82 Some limitations exist in these studies. First, accuracy cannot be guaranteed with only one 83 dataset because of an expected high false-positive rate. Further, using a single high-throughput 84 analysis method (only sequencing or chip data), results obtained will be biased. Second, a 85 patient's sample data are too limited. Finally, clinical information is incomplete. 86 Identifying the minimum number of robust genes needed to produce a predictive signature for 87 prognosis for patients with OS was the objective of this study. The lasso regression algorithm 88 was used to analyze sequencing data from TCGA-TARGET, and Cox regression analysis, risk 89 factor score, receiving operating curve (ROC), KMplot prognosis analysis, nomogram and other 90 methods were used to assess genes for their predictive power. Next, the accuracy and predictive 91 power of twenty-gene identified in this process were assessed using the GEO OS dataset. Finally, 92 we performed functional enrichment analysis on these twenty-gene. 93 2 Material and Methods 94 2.1 Data collection and preprocessing 95 Training set: The TARGET-OS RNA-sequencing dataset (presented as fragments per kilobase 96 million, FPKM), corresponding clinical characteristics and prognosis information were 97 downloaded from UCSC Xena (Goldman et al., 2019)(https://xena.ucsc.edu/). Patients with 98 expression profiles but no prognostic information and clinical characteristics were excluded. 99 Finally, 84 patients with OS were included in a training set. FPKM data were converted to TPM 100 data and annotated using gencode.v22.annotation.gene.probeMap. 101 Validation set: The gene expression data GSE21257(Buddingh et al., 2011) (GPL570 (HG-102 U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array) for 53 patients with OS were 103 downloaded from the GEO database, and corrected and annotated with R software. 104 Construction of gene signatures 105 A linear regression multiple regression model was developed for the underlying expression 106 levels of genes for prognostic risk scores. The method chosen by lasso Cox was 10-fold cross-107 validation. According to the median cutoff value (the cutoff value refers to the content before the 108 brackets of the HR value in each dataset) of the risk score, patients with OS were divided into 109 high-risk and low-risk groups. Model prediction efficiency using the training set was evaluated 110 by Kaplan-Meier log-rank test, time-dependent ROC curve analysis, Cox regression analysis, PeerJ reviewing PDF | (2020:07:50718:1:2:NEW 19 Nov 2020) Manuscript to be reviewed 111 and risk factor score for validation and test sets. A nomogram was constructed using Iasso's 112 guidelines. 113
doi:10.7287/peerj.10633v0.1/reviews/1 fatcat:vnrrk2ugvrb73nse6ivzp6nkou