A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2004; you can also visit the original URL.
The file type is application/pdf
.
Filters
Removing near-neighbour redundancy from large protein sequence collections
1998
Bioinformatics
Availability: A regularly updated non-redundant protein sequence database (nrdb90), a server for homology searches against nrdb90, and a Perl script (nrdb90.pl) implementing the algorithm are available ...
reduction of 46%, from 260 000 unique sequences to 140 000 representative sequences. ...
Results and discussion We have developed a complete and fast algorithm (nrdb90) for removing near neighbours and fragments from large sequence collections. ...
doi:10.1093/bioinformatics/14.5.423
pmid:9682055
fatcat:w2rqcwzngrevpiuzf3iwoa5zwy
RSDB: representative protein sequence databases have high information content
2000
Bioinformatics
Results: Comparisons of nine representative sequence databases (RSDB) derived from full protein databanks showed that the information content of sequence databases is not linearly proportional to its size ...
Motivation: Biological sequence databases are highly redundant for two main reasons: 1. various databanks keep redundant sequences with many identical and nearly identical sequences 2. natural sequences ...
distributed causing uneven profile weighting regardless of the level of near-neighbour removal. ...
doi:10.1093/bioinformatics/16.5.458
pmid:10871268
fatcat:i56ijyhnk5dmtj6tecbihzwzjq
Classification and Exploration of 3D Protein Domain Interactions Using Kbdock
[chapter]
2016
Msphere
+33 3 83 59 30 45 Keywords structural biology; structural homology; protein domains; protein domain family; domain-domain interactions; domain-peptide interactions; domain family interactions; domain family ...
Abstract Comparing and classifying protein domain interactions according to their three-dimensional (3D) structures can help to understand protein structure-function and evolutionary relationships. ...
The SCOP and CATH classifications use both sequence and structural similarities to collect protein domains in a hierarchical system of related domain families. ...
doi:10.1007/978-1-4939-3572-7_5
pmid:27115629
fatcat:qethyrkgibedxmsc5obhdrp7wu
Genome-wide survey and phylogeny of S-Ribosylhomocysteinase (LuxS) enzyme in bacterial genomes
2016
BMC Genomics
Results: Search for LuxS in the non-redundant database of protein sequences yielded 3106 sequences. ...
A majority of the neighbouring genes of LuxS have been found to be hypothetical proteins. ...
Synteny analysis of LuxS genes has shown the presence of large number of neighbouring genes annotated as hypothetical proteins suggesting a broader repertoire of biological functions are yet to be discovered ...
doi:10.1186/s12864-016-3002-x
pmid:27650568
pmcid:PMC5029033
fatcat:3pqa5ngubfazbfrinwxtnahl7u
SPA: a short peptide assembler for metagenomic data
2013
Nucleic Acids Research
Here, we present a method for reconstructing complete protein sequences directly from NGS metagenomic data. ...
Using large simulated and real metagenomic data sets, we show that our method outperforms the alternate approach of identifying genes on nucleotide sequence assemblies and generates longer protein sequences ...
The reference protein set R consisted of the non-redundant sequences obtained by clustering the full set of proteins from the chosen genomes using cd-hit at 95%. ...
doi:10.1093/nar/gkt118
pmid:23435317
pmcid:PMC3632116
fatcat:n6na2rlqnvhbvedsdzk3tcxyx4
Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information
2004
Bioinformatics
Overrepresentation of some residues in the binding sites was largely lost at the total sequence level, but a different kind of compositional preference was observed in DNA-binding proteins. ...
Results: Sequence composition was found to provide sufficient information to predict the probability of its binding to DNA with nearly 69% sensitivity at 64% accuracy for the considered proteins; sequence ...
Redundancy among sequences was first removed by using CD-HIT program from http://bioinformatics.burnhaminst.org/cd-hi (Li et al., 2001) with a threshold of 40% sequence identity. ...
doi:10.1093/bioinformatics/btg432
pmid:14990443
fatcat:7qtsqn2thrc2toirblsqpou564
Comparative genomics tools applied to bioterrorism defence
2003
Briefings in Bioinformatics
and genetic near-neighbour species. ...
neighbour sequence. ...
strains than are the nucleotide sequences. ...
doi:10.1093/bib/4.2.133
pmid:12846395
fatcat:2oevif6qpbfrhipp7mectbmww4
Screening and Functional Prediction of Conserved Hypothetical Proteins from Escherichia coli
2014
Journal of Proteomics & Bioinformatics
From this data set, the redundant proteins were removed through JalView [21] with a cut off of 80% of identity. ...
next step, the redundancy removal. ...
doi:10.4172/jpb.1000321
fatcat:ecnhu3ddfjfcdkvfja43ctdglu
Charge environments around phosphorylation sites in proteins
2008
BMC Structural Biology
Structural analyses have identified the importance of charge-charge interactions, for example mediating phosphorylation-driven allosteric change and protein binding to phosphopeptides. ...
(a) 100%, no removal of sequences. (b) Sequences culled at 25% sequence identity. ...
These sets represent the complete phosphorylation data (100%), removal of near-identical copies (90%), and removal of copies that are clearly homologous (25%). ...
doi:10.1186/1472-6807-8-19
pmid:18366741
pmcid:PMC2291461
fatcat:yvmmu26yqnccjcoa343yngmiwy
Tripartite degrons confer diversity and specificity on regulated protein degradation in the ubiquitin-proteasome system
2016
Nature Communications
These effects result from increased protein stability and interactome rewiring. The distributed nature of degrons ensures regulation, specificity and combinatorial control of degradation. ...
The model of the proteasome has been adapted from ref. 67. NATURE COMMUNICATIONS | ...
This work was supported by the Odysseus grant G.0029.12 from Research Foundation Flanders (FWO) to P.T. and a fellowship from the Marie Curie Initial Training Network project 264257 (IDPbyNMR) from the ...
doi:10.1038/ncomms10239
pmid:26732515
pmcid:PMC4729826
fatcat:sgrxbj4tjzhsrmujnstzhazo7i
Analysis of expression sequence tags from a full-length-enriched cDNA library of developing sesame seeds (Sesamum indicum)
2011
BMC Plant Biology
In this study, we carried out a large scale of expressed sequence tags (ESTs) sequencing from developing sesame seeds and further conducted analysis on seed storage products-related genes. ...
Results: A normalized and full-length enriched cDNA library from 5~30 days old immature seeds was constructed and randomly sequenced, leading to generation of 41,248 expressed sequence tags (ESTs) which ...
The cDNA library sequencing was conducted by BGI-Beijing. ...
doi:10.1186/1471-2229-11-180
pmid:22195973
pmcid:PMC3311628
fatcat:a327vlgrsrbffjgvta7u6n2t4y
A genomic perspective on human proteases
2001
FEBS Letters
However, the data already indicate that the mechanistic class, sequence family and domain distribution of the genomic complement of proteases is unlikely to shift significantly from that already observed ...
Over 400 human proteases documented in secondary databases can already be delineated in genomic sequence. ...
The SwissProt/TrEMBL (SP-TR) protein database includes a non-redundant set of human sequences [9] . ...
doi:10.1016/s0014-5793(01)02490-5
pmid:11412860
fatcat:d5bls25lvvbrblmjylrdvfwx34
Examining the Conservation of Kinks in Alpha Helices
2016
PLoS ONE
From this perspective, we examine the conservation of kinks in proteins. ...
Sequence identity between homologous helices is informative in terms of kink conservation, but almost equally so is the sequence identity of residues in spatial proximity to the kink. ...
Acknowledgments The authors would like to thank the EPSRC (grant number EP/G037280/1) and UCB Pharma for funding, and the Oxford Protein Informatics Group for discussion. ...
doi:10.1371/journal.pone.0157553
pmid:27314675
pmcid:PMC4912094
fatcat:rgiorxa34vgvrp3jmqt3ly6unm
Description of Thermogemmatispora carboxidivorans sp. nov., a carbon-monoxide-oxidizing member of the class Ktedonobacteria isolated from a geothermally heated biofilm, and analysis of carbon monoxide oxidation by members of the class Ktedonobacteria
2014
International Journal of Systematic and Evolutionary Microbiology
Growth was measured as cell protein content. Subsamples were collected at appropriate time intervals from sealed serum bottles using needles and 1 cm 3 syringes. ...
In the mid-1930s, a near-surface magma intrusion heated the overlying soil and forest, leaving behind largely decayed tree stumps, which now act as steam vents (Smith, 1981) . ...
doi:10.1099/ijs.0.059675-0
pmid:24425739
fatcat:yzhx32usabclln7mp7qbfu73zq
Prokaryotic and Highly-Repetitive WD40 Proteins: A Systematic Study
2017
Scientific Reports
Although investigations of eukaryotic WD40 proteins have been frequently reported, prokaryotic ones remain largely uncharacterized. ...
Comparisons show that a higher proportion of prokaryotic WD40s tend to contain multiple WD40 domains and a large number of hydrogen bond networks. ...
The datasets generated and/or analysed during the current study are available from the corresponding authors on reasonable request. ...
doi:10.1038/s41598-017-11115-1
pmid:28878378
pmcid:PMC5587647
fatcat:htbpryz7hfh3ha37tut6bfkc7u
« Previous
Showing results 1 — 15 out of 4,404 results