Protein Bioinformatics Databases and Resources [chapter]

Chuming Chen, Hongzhan Huang, Cathy H. Wu
2017 Msphere  
Many publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. To help researchers quickly find the appropriate protein related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation
more » ... ein bioinformatics databases and resources to support data integration and data analytics in the Big Data era. Protein bioinformatics databases can be primarily classified as sequence databases, 2D gel databases, 3D structure databases, chemistry databases, enzyme and pathway databases, family and domain databases, gene expression databases, genome annotation databases, organism specific databases, phylogenomic databases, polymorphism and mutation databases, protein-protein interaction databases, proteomic databases, PTM databases, ontologies, specialized protein databases, and other (miscellaneous) databases. Please visit http://proteininformationresource.org/staff/chenc/MiMB/dbSummary2015.html to access the databases reviewed in this chapter through their corresponding web addresses (URLs). For many of these databases, their identifiers can be mapped to UniProtKB protein AC/IDs [7]. Our coverage of protein bioinformatics databases in this chapter is by no means exhaustive. Our intention is to cover databases that are recent, high quality, publicly available, and are expected to be of interest to more users in the community. It is worth noting that certain databases can be classified into more than one category. As an update to our previously contributed MiMB series chapter [8], we now focus on databases that are aligned with the content of this book and emphasize the types of data stored and related data access and data analysis supports. For each category of databases listed in Table 1 , we select some representatives and describe them briefly in section 2. In section 3, we discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in Big Data era. We conclude the chapter in section 4. Chen et al.
doi:10.1007/978-1-4939-6783-4_1 pmid:28150231 pmcid:PMC5506686 fatcat:rrclcq47p5hqrlffz2v2xexjoy