A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized, and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, the question of their sustainability is raised due to the growth of biomedical literature. By using UniProtKB/Swiss-Prot as a case study, we address this question by using differentdoi:10.1101/094011 fatcat:7mstunijrnbohhm7stdyv4wxve
more »... terature triage approaches. With the assistance of the PubTator text-mining tool, we tagged more than 10,000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture. We show that a large fraction of published papers found in PubMed is not relevant for curation in UniProtKB/Swiss-Prot and demonstrate that, despite appearances, expert curation is sustainable.
We all know that the dogma 'one gene, one protein' is obsolete. A functional protein and, likewise, a protein's ultimate function depend not only on the underlying genetic information but also on the ongoing conditions of the cellular system. Frequently the transcript, like the polypeptide, is processed in multiple ways, but only one or a few out of a multitude of possible variants are produced at a time. An overview on processes that can lead to sequence variety and structural diversity indoi:10.1016/j.crvi.2005.06.001 pmid:16286078 fatcat:m72bsba2enfxdm5xue4shdtnua
more »... ryotes is given. The UniProtKB/Swiss-Prot protein knowledgebase provides a wealth of information regarding protein variety, function and associated disorders. Examples for such annotation are shown and further ones are available at http://www.expasy.org/sprot/tutorial/examples_CRB. To cite this article: B. Boeckmann et al., C. R. Biologies 328 (2005). 2005 Académie des sciences. Published by Elsevier SAS. All rights reserved. Résumé Un gène, plusieurs protéines : l'annotation de Swiss-Prot dans le contexte biologique. Il est maintenant évident pour tout le monde que le dogme « un gène, une protéine » est obsolète. Au cours de la synthèse d'une protéine fonctionnelle, le transcrit et la chaîne polypeptidique peuvent être modifiés de multiples façons. Ces modifications ont une incidence directe sur la fonction biologique de la protéine et dépendent non seulement de l'information génétique, mais également des conditions dans lesquelles se trouve la cellule : un nombre limité d'isoformes protéiques est produit dans une cellule donnée, à un moment précis. Cet article dresse un bref inventaire des processus biologiques impliqués dans la formation de protéines différentes à partir d'un même gène chez les eucaryotes, ainsi qu'une description des diversités structurelle et fonctionnelle qui en découlent. La banque de connaissances sur les protéines UniProtKB/Swiss-Prot est particulièrement riche en informations décrivant l'origine des différences entre les séquences de protéines dérivées d'un même gène, les modifications post-traductionnelles, ainsi que les conséquences de cette variabilité sur leur(s) fonction(s) et, le cas échéant, les maladies associées. De nombreux exemples
Motivation: Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multipledoi:10.1093/bioinformatics/btx439 pmid:29036270 pmcid:PMC5860168 fatcat:nofpf4cf5fgdjmti4oa6igus5q
more »... ure triage approaches. Results: With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000-10 000 papers are curated in UniProt each year while curators evaluate 50 000-70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2-3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. Availability and implementation: UniProt is freely available at http://www.uniprot.org/.
The IntAct molecular interaction database has created a new, free, open-source, manually curated resource, the Complex Portal (www.ebi.ac.uk/intact/ complex), through which protein complexes from major model organisms are being collated and made available for search, viewing and download. It has been built in close collaboration with other bioinformatics services and populated with data from ChEMBL, MatrixDB, PDBe, Reactome and UniPro-tKB. Each entry contains information about the participatingdoi:10.1093/nar/gku975 pmid:25313161 pmcid:PMC4384031 fatcat:afmhddxkfbg7rm2rp5yjgj4fyi
more »... molecules (including small molecules and nucleic acids), their stoichiometry, topology and structural assembly. Complexes are annotated with details about their function, properties and complex-specific Gene Ontology (GO) terms. Consistent nomenclature is used throughout the resource with systematic names, recommended names and a list of synonyms all provided. The use of the Evidence Code Ontology allows us to indicate for which entries direct experimental evidence is available or if the complex has been inferred based on homology or orthology. The data are searchable using standard identifiers, such as UniProt, ChEBI and GO IDs, protein, gene and complex names or synonyms. This reference resource will be maintained and grow to encompass an increasing number of organisms. In-put from groups and individuals with specific areas of expertise is welcome.
perspective nature methods | VOL.9 NO.4 | APRIL 2012 | 345 the international molecular exchange (imex) consortium is an international collaboration between major public interaction data providers to share literature-curation efforts and make a nonredundant set of protein interactions available in a single search interface on a common website (http://www.imexconsortium.org/). common curation rules have been developed, and a central registry is used to manage the selection of articles to enterdoi:10.1038/nmeth.1931 pmid:22453911 pmcid:PMC3703241 fatcat:ev2rpzb3c5hkpo52af3rfglfou
more »... o the dataset. We discuss the advantages of such a service to the user, our quality-control measures and our data-distribution practices. Protein-protein interactions are a key element in our understanding of molecular biology. However, in contrast to areas of activity such as DNA sequencing or protein structural analysis, the systematic capture of published molecular interaction data into public domain repositories is still in its infancy. This is not due to lack of resources in this domain. As of December 2011, the PathGuide resource 1 listed more than 100 protein-protein interaction-related databases. Although many of these databases focus on predictions of potential interactions or on mapping interologs, rather than experimentally determined interactions, the extent of activity suggests ample resources. However, most of these resources are independently funded and pursue their goals in isolation. As a result, accessing all publicly available molecular interaction data, even on a specific biological or biomedical topic, is a challenging, time-consuming task that requires the user to query multiple resources, each with a different interface; additionally, many resources use different identifiers and often contain redundant data from overlapping sets of publications. Efforts to address this problem began ten years ago with the development of a common file format for representing protein-interaction data. The 'minimum information about a molecular interaction experiment' (MIMIX) guidelines had been published then 2 , defining a list of the information to be supplied when
IntAct (freely available at http://www.ebi.ac.uk/ intact) is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. IntAct has developed a sophisticated web-based curation tool, capable of supporting both IMExand MIMIx-level curation. This tool is now utilized by multiple additional curation teams, all of whom annotate data directly into the IntAct database. Members of the IntAct team supply appropriatedoi:10.1093/nar/gkt1115 pmid:24234451 pmcid:PMC3965093 fatcat:5qemzevwgzdjln3y4pxvgfz7aa
more »... s of training, perform quality control on entries and take responsibility for long-term data maintenance. Recently, the MINT and IntAct databases decided to merge their separate efforts to make optimal use of limited developer resources and maximize the curation output. All data manually curated by the MINT curators have been moved into the IntAct database at EMBL-EBI and are merged with the existing IntAct dataset. Both IntAct and MINT are active contributors to the IMEx consortium
A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, thedoi:10.1038/nbt926 pmid:14755292 fatcat:45igrs4skjf75ckj5xfyosfj5u