4,404 Hits in 5.3 sec

Removing near-neighbour redundancy from large protein sequence collections

L. Holm, C. Sander
1998 Bioinformatics  
Availability: A regularly updated non-redundant protein sequence database (nrdb90), a server for homology searches against nrdb90, and a Perl script ( implementing the algorithm are available  ...  reduction of 46%, from 260 000 unique sequences to 140 000 representative sequences.  ...  Results and discussion We have developed a complete and fast algorithm (nrdb90) for removing near neighbours and fragments from large sequence collections.  ... 
doi:10.1093/bioinformatics/14.5.423 pmid:9682055 fatcat:w2rqcwzngrevpiuzf3iwoa5zwy

RSDB: representative protein sequence databases have high information content

J. Park, L. Holm, A. Heger, C. Chothia
2000 Bioinformatics  
Results: Comparisons of nine representative sequence databases (RSDB) derived from full protein databanks showed that the information content of sequence databases is not linearly proportional to its size  ...  Motivation: Biological sequence databases are highly redundant for two main reasons: 1. various databanks keep redundant sequences with many identical and nearly identical sequences 2. natural sequences  ...  distributed causing uneven profile weighting regardless of the level of near-neighbour removal.  ... 
doi:10.1093/bioinformatics/16.5.458 pmid:10871268 fatcat:i56ijyhnk5dmtj6tecbihzwzjq

Classification and Exploration of 3D Protein Domain Interactions Using Kbdock [chapter]

Anisah W. Ghoorah, Marie-Dominique Devignes, Malika Smaïl-Tabbone, David W. Ritchie
2016 Msphere  
+33 3 83 59 30 45 Keywords structural biology; structural homology; protein domains; protein domain family; domain-domain interactions; domain-peptide interactions; domain family interactions; domain family  ...  Abstract Comparing and classifying protein domain interactions according to their three-dimensional (3D) structures can help to understand protein structure-function and evolutionary relationships.  ...  The SCOP and CATH classifications use both sequence and structural similarities to collect protein domains in a hierarchical system of related domain families.  ... 
doi:10.1007/978-1-4939-3572-7_5 pmid:27115629 fatcat:qethyrkgibedxmsc5obhdrp7wu

Genome-wide survey and phylogeny of S-Ribosylhomocysteinase (LuxS) enzyme in bacterial genomes

Rajas M. Rao, Shaik Naseer Pasha, Ramanathan Sowdhamini
2016 BMC Genomics  
Results: Search for LuxS in the non-redundant database of protein sequences yielded 3106 sequences.  ...  A majority of the neighbouring genes of LuxS have been found to be hypothetical proteins.  ...  Synteny analysis of LuxS genes has shown the presence of large number of neighbouring genes annotated as hypothetical proteins suggesting a broader repertoire of biological functions are yet to be discovered  ... 
doi:10.1186/s12864-016-3002-x pmid:27650568 pmcid:PMC5029033 fatcat:3pqa5ngubfazbfrinwxtnahl7u

SPA: a short peptide assembler for metagenomic data

Youngik Yang, Shibu Yooseph
2013 Nucleic Acids Research  
Here, we present a method for reconstructing complete protein sequences directly from NGS metagenomic data.  ...  Using large simulated and real metagenomic data sets, we show that our method outperforms the alternate approach of identifying genes on nucleotide sequence assemblies and generates longer protein sequences  ...  The reference protein set R consisted of the non-redundant sequences obtained by clustering the full set of proteins from the chosen genomes using cd-hit at 95%.  ... 
doi:10.1093/nar/gkt118 pmid:23435317 pmcid:PMC3632116 fatcat:n6na2rlqnvhbvedsdzk3tcxyx4

Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information

S. Ahmad, M. M. Gromiha, A. Sarai
2004 Bioinformatics  
Overrepresentation of some residues in the binding sites was largely lost at the total sequence level, but a different kind of compositional preference was observed in DNA-binding proteins.  ...  Results: Sequence composition was found to provide sufficient information to predict the probability of its binding to DNA with nearly 69% sensitivity at 64% accuracy for the considered proteins; sequence  ...  Redundancy among sequences was first removed by using CD-HIT program from (Li et al., 2001) with a threshold of 40% sequence identity.  ... 
doi:10.1093/bioinformatics/btg432 pmid:14990443 fatcat:7qtsqn2thrc2toirblsqpou564

Screening and Functional Prediction of Conserved Hypothetical Proteins from Escherichia coli

William F Porto Simone MariaNeto
2014 Journal of Proteomics & Bioinformatics  
From this data set, the redundant proteins were removed through JalView [21] with a cut off of 80% of identity.  ...  next step, the redundancy removal.  ... 
doi:10.4172/jpb.1000321 fatcat:ecnhu3ddfjfcdkvfja43ctdglu

Charge environments around phosphorylation sites in proteins

James Kitchen, Rebecca E Saunders, Jim Warwicker
2008 BMC Structural Biology  
Structural analyses have identified the importance of charge-charge interactions, for example mediating phosphorylation-driven allosteric change and protein binding to phosphopeptides.  ...  (a) 100%, no removal of sequences. (b) Sequences culled at 25% sequence identity.  ...  These sets represent the complete phosphorylation data (100%), removal of near-identical copies (90%), and removal of copies that are clearly homologous (25%).  ... 
doi:10.1186/1472-6807-8-19 pmid:18366741 pmcid:PMC2291461 fatcat:yvmmu26yqnccjcoa343yngmiwy

Tripartite degrons confer diversity and specificity on regulated protein degradation in the ubiquitin-proteasome system

Mainak Guharoy, Pallab Bhowmick, Mohamed Sallam, Peter Tompa
2016 Nature Communications  
These effects result from increased protein stability and interactome rewiring. The distributed nature of degrons ensures regulation, specificity and combinatorial control of degradation.  ...  The model of the proteasome has been adapted from ref. 67. NATURE COMMUNICATIONS |  ...  This work was supported by the Odysseus grant G.0029.12 from Research Foundation Flanders (FWO) to P.T. and a fellowship from the Marie Curie Initial Training Network project 264257 (IDPbyNMR) from the  ... 
doi:10.1038/ncomms10239 pmid:26732515 pmcid:PMC4729826 fatcat:sgrxbj4tjzhsrmujnstzhazo7i

Analysis of expression sequence tags from a full-length-enriched cDNA library of developing sesame seeds (Sesamum indicum)

Tao Ke, Caihua Dong, Han Mao, Yingzhong Zhao, Hong Chen, Hongyan Liu, Xuyan Dong, Chaobo Tong, Shengyi Liu
2011 BMC Plant Biology  
In this study, we carried out a large scale of expressed sequence tags (ESTs) sequencing from developing sesame seeds and further conducted analysis on seed storage products-related genes.  ...  Results: A normalized and full-length enriched cDNA library from 5~30 days old immature seeds was constructed and randomly sequenced, leading to generation of 41,248 expressed sequence tags (ESTs) which  ...  The cDNA library sequencing was conducted by BGI-Beijing.  ... 
doi:10.1186/1471-2229-11-180 pmid:22195973 pmcid:PMC3311628 fatcat:a327vlgrsrbffjgvta7u6n2t4y

A genomic perspective on human proteases

Christopher Southan
2001 FEBS Letters  
However, the data already indicate that the mechanistic class, sequence family and domain distribution of the genomic complement of proteases is unlikely to shift significantly from that already observed  ...  Over 400 human proteases documented in secondary databases can already be delineated in genomic sequence.  ...  The SwissProt/TrEMBL (SP-TR) protein database includes a non-redundant set of human sequences [9] .  ... 
doi:10.1016/s0014-5793(01)02490-5 pmid:11412860 fatcat:d5bls25lvvbrblmjylrdvfwx34

Examining the Conservation of Kinks in Alpha Helices

Eleanor C. Law, Henry R. Wilman, Sebastian Kelm, Jiye Shi, Charlotte M. Deane, Hendrik W. van Veen
2016 PLoS ONE  
From this perspective, we examine the conservation of kinks in proteins.  ...  Sequence identity between homologous helices is informative in terms of kink conservation, but almost equally so is the sequence identity of residues in spatial proximity to the kink.  ...  Acknowledgments The authors would like to thank the EPSRC (grant number EP/G037280/1) and UCB Pharma for funding, and the Oxford Protein Informatics Group for discussion.  ... 
doi:10.1371/journal.pone.0157553 pmid:27314675 pmcid:PMC4912094 fatcat:rgiorxa34vgvrp3jmqt3ly6unm

Description of Thermogemmatispora carboxidivorans sp. nov., a carbon-monoxide-oxidizing member of the class Ktedonobacteria isolated from a geothermally heated biofilm, and analysis of carbon monoxide oxidation by members of the class Ktedonobacteria

C. E. King, G. M. King
2014 International Journal of Systematic and Evolutionary Microbiology  
Growth was measured as cell protein content. Subsamples were collected at appropriate time intervals from sealed serum bottles using needles and 1 cm 3 syringes.  ...  In the mid-1930s, a near-surface magma intrusion heated the overlying soil and forest, leaving behind largely decayed tree stumps, which now act as steam vents (Smith, 1981) .  ... 
doi:10.1099/ijs.0.059675-0 pmid:24425739 fatcat:yzhx32usabclln7mp7qbfu73zq

Prokaryotic and Highly-Repetitive WD40 Proteins: A Systematic Study

Xue-Jia Hu, Tuan Li, Yang Wang, Yao Xiong, Xian-Hui Wu, De-Lin Zhang, Zhi-Qiang Ye, Yun-Dong Wu
2017 Scientific Reports  
Although investigations of eukaryotic WD40 proteins have been frequently reported, prokaryotic ones remain largely uncharacterized.  ...  Comparisons show that a higher proportion of prokaryotic WD40s tend to contain multiple WD40 domains and a large number of hydrogen bond networks.  ...  The datasets generated and/or analysed during the current study are available from the corresponding authors on reasonable request.  ... 
doi:10.1038/s41598-017-11115-1 pmid:28878378 pmcid:PMC5587647 fatcat:htbpryz7hfh3ha37tut6bfkc7u
« Previous Showing results 1 — 15 out of 4,404 results