A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is
Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcomparedoi:10.1093/nar/gks1301 pmid:23268451 pmcid:PMC3561995 fatcat:oeohciho5ze7vbarqoogqidbfu
more »... s/ tfcompare), a structural alignment method for protein-DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein-DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.
FootprintDB is a database and search engine that compiles regulatory sequences from open access libraries of curated DNA cis-elements and motifs, and their associated transcription factors (TFs). It systematically annotates the binding interfaces of the TFs by exploiting protein-DNA complexes deposited in the Protein Data Bank. Each entry in footprintDB is thus a DNA motif linked to the protein sequence of the TF(s) known to recognize it, and in most cases, the set of predicted interfacedoi:10.1007/978-1-4939-6396-6_17 pmid:27557773 fatcat:wnjluag6mfazflewblzgmvosqi
more »... s involved in specific recognition. This chapter explains step-by-step how to search for DNA motifs and protein sequences in footprintDB and how to focus the search to a particular organism. Two real-world examples are shown where this software was used to analyze transcriptional regulation in plants. Results are described with the aim of guiding users on their interpretation, and special attention is given to the choices users might face when performing similar analyzes.
Intrinsically disordered proteins, found in all living organisms, are essential for basic cellular functions and complement the function of ordered proteins. It has been shown that protein disorder is linked to the G + C content of the genome. Furthermore, recent investigations have suggested that the evolutionary dynamics of the plant nucleus adds disordered segments to open reading frames alike, and these segments are not necessarily conserved among orthologous genes. Results: In the presentdoi:10.1186/1471-2164-14-772 pmid:24206529 pmcid:PMC3828576 fatcat:dxstzpn4pjhbheacwv7avwx5lu
more »... ork the distribution of intrinsically disordered proteins along the chromosomes of several representative plants was analyzed. The reported results support a non-random distribution of disordered proteins along the chromosomes of Arabidopsis thaliana and Oryza sativa, two model eudicot and monocot plant species, respectively. In fact, for most chromosomes positive correlations between the frequency of disordered segments of 30+ amino acids and both recombination rates and G + C content were observed. Conclusions: These analyses demonstrate that the presence of disordered segments among plant proteins is associated with the rates of genetic recombination of their encoding genes. Altogether, these findings suggest that high recombination rates, as well as chromosomal rearrangements, could induce disordered segments in proteins during evolution.
BMC Plant Biology
Flavin adenine dinucleotide synthetases (FADSs) -a group of bifunctional enzymes that carry out the dual functions of riboflavin phosphorylation to produce flavin mononucleotide (FMN) and its subsequent adenylation to generate FAD in most prokaryotes -were studied in plants in terms of sequence, structure and evolutionary history. Results: Using a variety of bioinformatics methods we have found that FADS enzymes localized to the chloroplasts, which we term as plant-like FADS proteins, aredoi:10.1186/1471-2148-10-311 pmid:20955574 pmcid:PMC2972280 fatcat:agogwn2iyzb45pyyo2a5t6rjzi
more »... buted across a variety of green plant lineages and constitute a divergent protein family clearly of cyanobacterial origin. The C-terminal module of these enzymes does not contain the typical riboflavin kinase active site sequence, while the N-terminal module is broadly conserved. These results agree with a previous work reported by Sandoval et al. in 2008 . Furthermore, our observations and preliminary experimental results indicate that the C-terminus of plant-like FADS proteins may contain a catalytic activity, but different to that of their prokaryotic counterparts. In fact, homology models predict that plant-specific conserved residues constitute a distinct active site in the C-terminus. Conclusions: A structure-based sequence alignment and an in-depth evolutionary survey of FADS proteins, thought to be crucial in plant metabolism, are reported, which will be essential for the correct annotation of plant genomes and further structural and functional studies. This work is a contribution to our understanding of the evolutionary history of plant-like FADS enzymes, which constitute a new family of FADS proteins whose C-terminal module might be involved in a distinct catalytic activity.
3D-footprint is a living database, updated and curated on a weekly basis, which provides estimates of binding specificity for all protein-DNA complexes available at the Protein Data Bank. The web interface allows the user to: (i) browse DNA-binding proteins by keyword; (ii) find proteins that recognize a similar DNA motif and (iii) BLAST similar DNA-binding proteins, highlighting interface residues in the resulting alignments. Each complex in the database is dissected to draw interface graphsdoi:10.1093/nar/gkp781 pmid:19767616 pmcid:PMC2808867 fatcat:5pwlfif6izdpdcojkmbf2c4pkm
more »... d footprint logos, and two complementary algorithms are employed to characterize binding specificity. Moreover, oligonucleotide sequences extracted from literature abstracts are reported in order to show the range of variant sites bound by each protein and other related proteins. Benchmark experiments, including comparisons with expertcurated databases RegulonDB and TRANSFAC, support the quality of structure-based estimates of specificity. The relevant content of the database is available for download as flat files and it is also possible to use the 3D-footprint pipeline to analyze protein coordinates input by the user. 3D-footprint is available at http://floresta.eead. csic.es/3dfootprint with demo buttons and a comprehensive tutorial that illustrates the main uses of this resource.
One option is FootprintDB(16) which is a meta- database encompassing 14 up-to-date motif databases (see chapter by Contreras-Moreira and Sebastián in this Volume). ... In plant genomes, repeated elements may result from various sources: transposons, polyploidy, etc (see chapter by Contreras-Moreira, Castro-Mondragon et al. in this Volume). ...doi:10.1007/978-1-4939-6396-6_19 pmid:27557775 fatcat:uf4jvjkdw5ed5h4jy6bhwui3si
Abbreviations: TF = Transcription Factor, ZF = C2H2/C2HC zinc fingers, HE = Homing endonucleases, RE = Restriction endonucleases, LR = lambda repressor-like, H = Homeodomain-like, P53 = p53-like, WH = Winged helix, GR = Glucocorticoid receptor-like, RHH = Ribbon-helix-helix, RMSD = root mean square deviation, IAS = interface alignment score, DBD = DNA binding domaindoi:10.1002/prot.22525 pmid:19731374 fatcat:lhauocdqqjcrjdz5bbedwe26au
Bruno Contreras-Moreira et al. ...doi:10.1007/978-1-0716-2067-0_2 pmid:35037199 fatcat:lx6odri4lvfrzkutlccjpuzgg4
., 2018) and performed the motif discovery protocol 257 described in Contreras-Moreira et al. (2016) and Ksouri et al. (2021). ... The resulting motifs were 261 compared to motifs annotated in the footprintDB database (Sebastian and Contreras-262 Moreira, 2014). ...doi:10.1101/2021.02.03.429565 fatcat:7yesjeyonffcjaiyac2nwlisom
., 2011; Yruela and Contreras-Moreira, 2012; Yruela et al., 2017) . ... These data are in agreement with previous results (Yruela and Contreras-Moreira, 2013) . ...doi:10.3389/fpls.2018.01216 pmid:30177944 pmcid:PMC6109787 fatcat:prgzp5krerd6jf46y2lcatt4ry
doi:10.1093/bioinformatics/btm148 pmid:17459960 fatcat:dzs5lj3nkzcg7bckpsmbvgh3gm
This article comments on: Dixon LE, Pasquariello M, Boden SA. 2020. TEOSINTE BRANCHED1 regulates height and stem internode length in bread wheat. Journal of Experimental Botany 71, 4742–4750.doi:10.1093/jxb/eraa308 pmid:32761247 fatcat:vuveghiwevcdfjsqlte7y5k6ny
The annotation of repetitive sequences within plant genomes can help in the interpretation of observed phenotypes. Moreover, repeat masking is required for tasks such as whole-genome alignment, promoter analysis or pangenome exploration. While homology-based annotation methods are computationally expensive, k-mer strategies for masking are orders of magnitude faster. Here we benchmark a two-step approach, where repeats are first called by k-mer counting and then annotated by comparison todoi:10.1101/2021.03.22.436504 fatcat:ifv7wmw3zbcqll6bm44n2ukt2i
more »... d libraries. This hybrid protocol was tested on 20 plant genomes from Ensembl, using the kmer-based Repeat Detector (Red) and two repeat libraries (REdat and nrTEplants, curated for this work). We obtained repeated genome fractions that match those reported in the literature, but with shorter repeated elements than those produced with conventional annotators. Inspection of masked regions overlapping genes revealed no preference for specific protein domains. Half of Red masked sequences can be successfully classified with nrTEplants, with the complete protocol taking less than 2h on a desktop Linux box. The repeat library and the scripts to mask and annotate plant genomes can be obtained at https://github.com/Ensembl/plant-scripts .
Lecture Notes in Computer Science
Cantalapiedra, and Bruno Contreras-Moreira Pairwise alignments of Homeobox domains Pairs of protein sequences from the validation set where aligned with the BLASTP program  . ...doi:10.1007/978-3-642-28062-7_8 fatcat:taow7ed2pvgohg5kmp7gwe534u
« Previous Showing results 1 — 15 out of 689 results