Bioinformatics analysis based on the results of LC-MS/MS v1 ( [dataset]

Lihui Li
1 Cluster3.0 and the Java Treeview software were used to study protein relative expression data performing hierarchicalclustering analysis. Euclidean distance algorithm for similarity measure and average linkage clustering algorithm for clusteringwere selected when performing hierarchical clustering. A heatmap was created as a visual aid in addition to the dendrogram. 2 The protein sequences of differentially expressed proteins were in batches retrieved fromUniProtKB database in FASTA format;
more » ... in FASTA format; then the NCBI BL AST + BL AST + software was used to find homologue sequences and transferred the functional annotation to the studied sequences (, which include three major categories: biological processes (BP), molecular function (MF), and cellular components (CC). In our study, we retrieved the top 20 blast hits (E-value 1e-3) for each query sequence and loaded them into Bl a st2G O Bl a st2G O 10 (Version 3.3.5) for Gene Ontology (GO) mapping and annotation. Regarding Kyoto Encyclopedia of Genes and Genomes (KEGG), the FASTA protein sequences of differentially altered proteins between two groups were blasted against the online KEGG database ( and then KEGG 12 was subsequently used for mapping pathways. The significantly overrepresented pathways and GO terms were extracted and classified. The enrichment analysis was performed to further explore the impact of differentially expressed protein in cell physiological process and discover the mechanism of internal relations between them. GO enrichment on three categories (biological process, molecular function, and cellular component) and KEGG pathway enrichment were analyzed by the Fisher' exact test based on the entire quantified protein annotations as the background dataset. To adjust derived p-values, Benjamini-Hochberg correction was further applied for multiple testing. Only functional categories and pathways with p-values≤0.05 were considered as significant. 3 The protein-protein interaction information of the studied proteins was retrieved fromIntAct molecular interaction database by their genesymbols or STRING software. The results were downloaded in theXGMML format and imported into Cytoscape software to visualize and further analyze functionalprotein-protein interaction networks. Furthermore, to evaluate the importance of the protein in the PPI network, the degree of each protein wascalculated.
doi:10.17504/ fatcat:omcodfglcfa6tea4axdvvmtsxa