Filters








8 Hits in 0.6 sec

On expert curation and sustainability: UniProtKB/Swiss-Prot as a case study [article]

Sylvain Poux, Cecilia N. Arighi, Michele Magrane, Alex Bateman, Chih-Hsuan Wei, Zhiyong Lu, Emmanuel Boutet, Hema Bye-A-Jee, Maria Livia Famiglietti, Bernd Roechert
2016 bioRxiv   pre-print
Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized, and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, the question of their sustainability is raised due to the growth of biomedical literature. By using UniProtKB/Swiss-Prot as a case study, we address this question by using different
more » ... terature triage approaches. With the assistance of the PubTator text-mining tool, we tagged more than 10,000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture. We show that a large fraction of published papers found in PubMed is not relevant for curation in UniProtKB/Swiss-Prot and demonstrate that, despite appearances, expert curation is sustainable.
doi:10.1101/094011 fatcat:7mstunijrnbohhm7stdyv4wxve

Protein variety and functional diversity: Swiss-Prot annotation in its biological context

Brigitte Boeckmann, Marie-Claude Blatter, Livia Famiglietti, Ursula Hinz, Lydie Lane, Bernd Roechert, Amos Bairoch
2005 Comptes rendus. Biologies  
We all know that the dogma 'one gene, one protein' is obsolete. A functional protein and, likewise, a protein's ultimate function depend not only on the underlying genetic information but also on the ongoing conditions of the cellular system. Frequently the transcript, like the polypeptide, is processed in multiple ways, but only one or a few out of a multitude of possible variants are produced at a time. An overview on processes that can lead to sequence variety and structural diversity in
more » ... ryotes is given. The UniProtKB/Swiss-Prot protein knowledgebase provides a wealth of information regarding protein variety, function and associated disorders. Examples for such annotation are shown and further ones are available at http://www.expasy.org/sprot/tutorial/examples_CRB. To cite this article: B. Boeckmann et al., C. R. Biologies 328 (2005).  2005 Académie des sciences. Published by Elsevier SAS. All rights reserved. Résumé Un gène, plusieurs protéines : l'annotation de Swiss-Prot dans le contexte biologique. Il est maintenant évident pour tout le monde que le dogme « un gène, une protéine » est obsolète. Au cours de la synthèse d'une protéine fonctionnelle, le transcrit et la chaîne polypeptidique peuvent être modifiés de multiples façons. Ces modifications ont une incidence directe sur la fonction biologique de la protéine et dépendent non seulement de l'information génétique, mais également des conditions dans lesquelles se trouve la cellule : un nombre limité d'isoformes protéiques est produit dans une cellule donnée, à un moment précis. Cet article dresse un bref inventaire des processus biologiques impliqués dans la formation de protéines différentes à partir d'un même gène chez les eucaryotes, ainsi qu'une description des diversités structurelle et fonctionnelle qui en découlent. La banque de connaissances sur les protéines UniProtKB/Swiss-Prot est particulièrement riche en informations décrivant l'origine des différences entre les séquences de protéines dérivées d'un même gène, les modifications post-traductionnelles, ainsi que les conséquences de cette variabilité sur leur(s) fonction(s) et, le cas échéant, les maladies associées. De nombreux exemples
doi:10.1016/j.crvi.2005.06.001 pmid:16286078 fatcat:m72bsba2enfxdm5xue4shdtnua

On expert curation and scalability: UniProtKB/Swiss-Prot as a case study

Sylvain Poux, Cecilia N Arighi, Michele Magrane, Alex Bateman, Chih-Hsuan Wei, Zhiyong Lu, Emmanuel Boutet, Hema Bye-A-Jee, Maria Livia Famiglietti, Bernd Roechert, The UniProt Consortium, Janet Kelso
2017 Bioinformatics  
Motivation: Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple
more » ... ure triage approaches. Results: With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000-10 000 papers are curated in UniProt each year while curators evaluate 50 000-70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2-3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. Availability and implementation: UniProt is freely available at http://www.uniprot.org/.
doi:10.1093/bioinformatics/btx439 pmid:29036270 pmcid:PMC5860168 fatcat:nofpf4cf5fgdjmti4oa6igus5q

The complex portal - an encyclopaedia of macromolecular complexes

Birgit H.M. Meldal, Oscar Forner-Martinez, Maria C. Costanzo, Jose Dana, Janos Demeter, Marine Dumousseau, Selina S. Dwight, Anna Gaulton, Luana Licata, Anna N. Melidoni, Sylvie Ricard-Blum, Bernd Roechert (+6 others)
2014 Nucleic Acids Research  
The IntAct molecular interaction database has created a new, free, open-source, manually curated resource, the Complex Portal (www.ebi.ac.uk/intact/ complex), through which protein complexes from major model organisms are being collated and made available for search, viewing and download. It has been built in close collaboration with other bioinformatics services and populated with data from ChEMBL, MatrixDB, PDBe, Reactome and UniPro-tKB. Each entry contains information about the participating
more » ... molecules (including small molecules and nucleic acids), their stoichiometry, topology and structural assembly. Complexes are annotated with details about their function, properties and complex-specific Gene Ontology (GO) terms. Consistent nomenclature is used throughout the resource with systematic names, recommended names and a list of synonyms all provided. The use of the Evidence Code Ontology allows us to indicate for which entries direct experimental evidence is available or if the complex has been inferred based on homology or orthology. The data are searchable using standard identifiers, such as UniProt, ChEBI and GO IDs, protein, gene and complex names or synonyms. This reference resource will be maintained and grow to encompass an increasing number of organisms. In-put from groups and individuals with specific areas of expertise is welcome.
doi:10.1093/nar/gku975 pmid:25313161 pmcid:PMC4384031 fatcat:afmhddxkfbg7rm2rp5yjgj4fyi

Protein interaction data curation: the International Molecular Exchange (IMEx) consortium

Sandra Orchard, Samuel Kerrien, Sara Abbani, Bruno Aranda, Jignesh Bhate, Shelby Bidwell, Alan Bridge, Leonardo Briganti, Fiona S L Brinkman, Gianni Cesareni, Andrew Chatr-aryamontri, Emilie Chautard (+19 others)
2012 Nature Methods  
perspective nature methods | VOL.9 NO.4 | APRIL 2012 | 345 the international molecular exchange (imex) consortium is an international collaboration between major public interaction data providers to share literature-curation efforts and make a nonredundant set of protein interactions available in a single search interface on a common website (http://www.imexconsortium.org/). common curation rules have been developed, and a central registry is used to manage the selection of articles to enter
more » ... o the dataset. We discuss the advantages of such a service to the user, our quality-control measures and our data-distribution practices. Protein-protein interactions are a key element in our understanding of molecular biology. However, in contrast to areas of activity such as DNA sequencing or protein structural analysis, the systematic capture of published molecular interaction data into public domain repositories is still in its infancy. This is not due to lack of resources in this domain. As of December 2011, the PathGuide resource 1 listed more than 100 protein-protein interaction-related databases. Although many of these databases focus on predictions of potential interactions or on mapping interologs, rather than experimentally determined interactions, the extent of activity suggests ample resources. However, most of these resources are independently funded and pursue their goals in isolation. As a result, accessing all publicly available molecular interaction data, even on a specific biological or biomedical topic, is a challenging, time-consuming task that requires the user to query multiple resources, each with a different interface; additionally, many resources use different identifiers and often contain redundant data from overlapping sets of publications. Efforts to address this problem began ten years ago with the development of a common file format for representing protein-interaction data. The 'minimum information about a molecular interaction experiment' (MIMIX) guidelines had been published then 2 , defining a list of the information to be supplied when
doi:10.1038/nmeth.1931 pmid:22453911 pmcid:PMC3703241 fatcat:ev2rpzb3c5hkpo52af3rfglfou

The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases

Sandra Orchard, Mais Ammari, Bruno Aranda, Lionel Breuza, Leonardo Briganti, Fiona Broackes-Carter, Nancy H. Campbell, Gayatri Chavali, Carol Chen, Noemi del-Toro, Margaret Duesbury, Marine Dumousseau (+23 others)
2013 Nucleic Acids Research  
IntAct (freely available at http://www.ebi.ac.uk/ intact) is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. IntAct has developed a sophisticated web-based curation tool, capable of supporting both IMExand MIMIx-level curation. This tool is now utilized by multiple additional curation teams, all of whom annotate data directly into the IntAct database. Members of the IntAct team supply appropriate
more » ... s of training, perform quality control on entries and take responsibility for long-term data maintenance. Recently, the MINT and IntAct databases decided to merge their separate efforts to make optimal use of limited developer resources and maximize the curation output. All data manually curated by the MINT curators have been moved into the IntAct database at EMBL-EBI and are merged with the existing IntAct dataset. Both IntAct and MINT are active contributors to the IMEx consortium
doi:10.1093/nar/gkt1115 pmid:24234451 pmcid:PMC3965093 fatcat:5qemzevwgzdjln3y4pxvgfz7aa

The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data

Henning Hermjakob, Luisa Montecchi-Palazzi, Gary Bader, Jérôme Wojcik, Lukasz Salwinski, Arnaud Ceol, Susan Moore, Sandra Orchard, Ugis Sarkans, Christian von Mering, Bernd Roechert, Sylvain Poux (+27 others)
2004 Nature Biotechnology  
A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the
doi:10.1038/nbt926 pmid:14755292 fatcat:45igrs4skjf75ckj5xfyosfj5u

KIAA1109 Variants Are Associated with a Severe Disorder of Brain Development and Arthrogryposis

Lucie Gueneau, Richard J. Fish, Hanan E. Shamseldin, Norine Voisin, Frédéric Tran Mau-Them, Egle Preiksaitiene, Glen R. Monroe, Angeline Lai, Audrey Putoux, Fabienne Allias, Qamariya Ambusaidi, Laima Ambrozaityte (+27 others)
2018 American Journal of Human Genetics  
doi:10.1016/j.ajhg.2017.12.002 pmid:29290337 pmcid:PMC5777449 fatcat:ns3coapbe5erbfvqirk2rgmjmm