689 Hits in 3.4 sec

The twilight zone of cis element alignments

Alvaro Sebastian, Bruno Contreras-Moreira
2012 Nucleic Acids Research  
Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare
more » ... s/ tfcompare), a structural alignment method for protein-DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein-DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.
doi:10.1093/nar/gks1301 pmid:23268451 pmcid:PMC3561995 fatcat:oeohciho5ze7vbarqoogqidbfu

FootprintDB: Analysis of Plant Cis-Regulatory Elements, Transcription Factors, and Binding Interfaces [chapter]

Bruno Contreras-Moreira, Alvaro Sebastian
2016 Msphere  
FootprintDB is a database and search engine that compiles regulatory sequences from open access libraries of curated DNA cis-elements and motifs, and their associated transcription factors (TFs). It systematically annotates the binding interfaces of the TFs by exploiting protein-DNA complexes deposited in the Protein Data Bank. Each entry in footprintDB is thus a DNA motif linked to the protein sequence of the TF(s) known to recognize it, and in most cases, the set of predicted interface
more » ... s involved in specific recognition. This chapter explains step-by-step how to search for DNA motifs and protein sequences in footprintDB and how to focus the search to a particular organism. Two real-world examples are shown where this software was used to analyze transcriptional regulation in plants. Results are described with the aim of guiding users on their interpretation, and special attention is given to the choices users might face when performing similar analyzes.
doi:10.1007/978-1-4939-6396-6_17 pmid:27557773 fatcat:wnjluag6mfazflewblzgmvosqi

Genetic recombination is associated with intrinsic disorder in plant proteomes

Inmaculada Yruela, Bruno Contreras-Moreira
2013 BMC Genomics  
Intrinsically disordered proteins, found in all living organisms, are essential for basic cellular functions and complement the function of ordered proteins. It has been shown that protein disorder is linked to the G + C content of the genome. Furthermore, recent investigations have suggested that the evolutionary dynamics of the plant nucleus adds disordered segments to open reading frames alike, and these segments are not necessarily conserved among orthologous genes. Results: In the present
more » ... ork the distribution of intrinsically disordered proteins along the chromosomes of several representative plants was analyzed. The reported results support a non-random distribution of disordered proteins along the chromosomes of Arabidopsis thaliana and Oryza sativa, two model eudicot and monocot plant species, respectively. In fact, for most chromosomes positive correlations between the frequency of disordered segments of 30+ amino acids and both recombination rates and G + C content were observed. Conclusions: These analyses demonstrate that the presence of disordered segments among plant proteins is associated with the rates of genetic recombination of their encoding genes. Altogether, these findings suggest that high recombination rates, as well as chromosomal rearrangements, could induce disordered segments in proteins during evolution.
doi:10.1186/1471-2164-14-772 pmid:24206529 pmcid:PMC3828576 fatcat:dxstzpn4pjhbheacwv7avwx5lu

Protein disorder in plants: a view from the chloroplast

Inmaculada Yruela, Bruno Contreras-Moreira
2012 BMC Plant Biology  
doi:10.1186/1471-2229-12-165 pmid:22970728 pmcid:PMC3460767 fatcat:uoudzepovvemrfw3bzy656ux5u

Evolutionary divergence of chloroplast FAD synthetase proteins

Inmaculada Yruela, Sonia Arilla-Luna, Milagros Medina, Bruno Contreras-Moreira
2010 BMC Evolutionary Biology  
Flavin adenine dinucleotide synthetases (FADSs) -a group of bifunctional enzymes that carry out the dual functions of riboflavin phosphorylation to produce flavin mononucleotide (FMN) and its subsequent adenylation to generate FAD in most prokaryotes -were studied in plants in terms of sequence, structure and evolutionary history. Results: Using a variety of bioinformatics methods we have found that FADS enzymes localized to the chloroplasts, which we term as plant-like FADS proteins, are
more » ... buted across a variety of green plant lineages and constitute a divergent protein family clearly of cyanobacterial origin. The C-terminal module of these enzymes does not contain the typical riboflavin kinase active site sequence, while the N-terminal module is broadly conserved. These results agree with a previous work reported by Sandoval et al. in 2008 . Furthermore, our observations and preliminary experimental results indicate that the C-terminus of plant-like FADS proteins may contain a catalytic activity, but different to that of their prokaryotic counterparts. In fact, homology models predict that plant-specific conserved residues constitute a distinct active site in the C-terminus. Conclusions: A structure-based sequence alignment and an in-depth evolutionary survey of FADS proteins, thought to be crucial in plant metabolism, are reported, which will be essential for the correct annotation of plant genomes and further structural and functional studies. This work is a contribution to our understanding of the evolutionary history of plant-like FADS enzymes, which constitute a new family of FADS proteins whose C-terminal module might be involved in a distinct catalytic activity.
doi:10.1186/1471-2148-10-311 pmid:20955574 pmcid:PMC2972280 fatcat:agogwn2iyzb45pyyo2a5t6rjzi

3D-footprint: a database for the structural analysis of protein–DNA complexes

Bruno Contreras-Moreira
2009 Nucleic Acids Research  
3D-footprint is a living database, updated and curated on a weekly basis, which provides estimates of binding specificity for all protein-DNA complexes available at the Protein Data Bank. The web interface allows the user to: (i) browse DNA-binding proteins by keyword; (ii) find proteins that recognize a similar DNA motif and (iii) BLAST similar DNA-binding proteins, highlighting interface residues in the resulting alignments. Each complex in the database is dissected to draw interface graphs
more » ... d footprint logos, and two complementary algorithms are employed to characterize binding specificity. Moreover, oligonucleotide sequences extracted from literature abstracts are reported in order to show the range of variant sites bound by each protein and other related proteins. Benchmark experiments, including comparisons with expertcurated databases RegulonDB and TRANSFAC, support the quality of structure-based estimates of specificity. The relevant content of the database is available for download as flat files and it is also possible to use the 3D-footprint pipeline to analyze protein coordinates input by the user. 3D-footprint is available at http://floresta.eead. with demo buttons and a comprehensive tutorial that illustrates the main uses of this resource.
doi:10.1093/nar/gkp781 pmid:19767616 pmcid:PMC2808867 fatcat:5pwlfif6izdpdcojkmbf2c4pkm

RSAT::Plants: Motif Discovery in ChIP-Seq Peaks of Plant Genomes [chapter]

Jaime A. Castro-Mondragon, Claire Rioualen, Bruno Contreras-Moreira, Jacques van Helden
2016 Msphere  
One option is FootprintDB(16) which is a meta- database encompassing 14 up-to-date motif databases (see chapter by Contreras-Moreira and Sebastián in this Volume).  ...  In plant genomes, repeated elements may result from various sources: transposons, polyploidy, etc (see chapter by Contreras-Moreira, Castro-Mondragon et al. in this Volume).  ... 
doi:10.1007/978-1-4939-6396-6_19 pmid:27557775 fatcat:uf4jvjkdw5ed5h4jy6bhwui3si

Comparison of DNA binding across protein superfamilies

Bruno Contreras-Moreira, Javier Sancho, Vladimir Espinosa Angarica
2009 Proteins: Structure, Function, and Bioinformatics  
Abbreviations: TF = Transcription Factor, ZF = C2H2/C2HC zinc fingers, HE = Homing endonucleases, RE = Restriction endonucleases, LR = lambda repressor-like, H = Homeodomain-like, P53 = p53-like, WH = Winged helix, GR = Glucocorticoid receptor-like, RHH = Ribbon-helix-helix, RMSD = root mean square deviation, IAS = interface alignment score, DBD = DNA binding domain
doi:10.1002/prot.22525 pmid:19731374 fatcat:lhauocdqqjcrjdz5bbedwe26au

Scripting Analyses of Genomes in Ensembl Plants [chapter]

Bruno Contreras-Moreira, Guy Naamati, Marc Rosello, James E. Allen, Sarah E. Hunt, Matthieu Muffato, Astrid Gall, Paul Flicek
2022 Msphere  
Bruno Contreras-Moreira et al.  ... 
doi:10.1007/978-1-0716-2067-0_2 pmid:35037199 fatcat:lx6odri4lvfrzkutlccjpuzgg4

Light spectra trigger divergent gene expression in barley cultivars [article]

Arantxa Monteagudo, Alvaro Rodríguez del Rio, Bruno Contreras-Moreira, Tibor Kiss, Marianna Mayer, Ildikó Karsai, Ernesto Igartua, Ana M Casas
2021 bioRxiv   pre-print
., 2018) and performed the motif discovery protocol 257 described in Contreras-Moreira et al. (2016) and Ksouri et al. (2021).  ...  The resulting motifs were 261 compared to motifs annotated in the footprintDB database (Sebastian and Contreras-262 Moreira, 2014).  ... 
doi:10.1101/2021.02.03.429565 fatcat:7yesjeyonffcjaiyac2nwlisom

Evolution of Protein Ductility in Duplicated Genes of Plants

Inmaculada Yruela, Bruno Contreras-Moreira, A. Keith Dunker, Karl J. Niklas
2018 Frontiers in Plant Science  
., 2011; Yruela and Contreras-Moreira, 2012; Yruela et al., 2017) .  ...  These data are in agreement with previous results (Yruela and Contreras-Moreira, 2013) .  ... 
doi:10.3389/fpls.2018.01216 pmid:30177944 pmcid:PMC6109787 fatcat:prgzp5krerd6jf46y2lcatt4ry

TFmodeller: comparative modelling of protein–DNA complexes

Bruno Contreras-Moreira, Pierre-Alain Branger, Julio Collado-Vides
2007 Computer applications in the biosciences : CABIOS  
doi:10.1093/bioinformatics/btm148 pmid:17459960 fatcat:dzs5lj3nkzcg7bckpsmbvgh3gm

TB1: from domestication gene to tool for many trades

Ernesto Igartua, Bruno Contreras-Moreira, Ana M Casas
2020 Journal of Experimental Botany  
This article comments on: Dixon LE, Pasquariello M, Boden SA. 2020. TEOSINTE BRANCHED1 regulates height and stem internode length in bread wheat. Journal of Experimental Botany 71, 4742–4750.
doi:10.1093/jxb/eraa308 pmid:32761247 fatcat:vuveghiwevcdfjsqlte7y5k6ny

Efficient masking of plant genomes by combining kmer counting and curated repeats [article]

Bruno Contreras-Moreira, Carla V Filippi, Guy Naamati, Carlos García Girón, James E Allen, Paul Flicek
2021 bioRxiv   pre-print
The annotation of repetitive sequences within plant genomes can help in the interpretation of observed phenotypes. Moreover, repeat masking is required for tasks such as whole-genome alignment, promoter analysis or pangenome exploration. While homology-based annotation methods are computationally expensive, k-mer strategies for masking are orders of magnitude faster. Here we benchmark a two-step approach, where repeats are first called by k-mer counting and then annotated by comparison to
more » ... d libraries. This hybrid protocol was tested on 20 plant genomes from Ensembl, using the kmer-based Repeat Detector (Red) and two repeat libraries (REdat and nrTEplants, curated for this work). We obtained repeated genome fractions that match those reported in the literature, but with shorter repeated elements than those produced with conventional annotators. Inspection of masked regions overlapping genes revealed no preference for specific protein domains. Half of Red masked sequences can be successfully classified with nrTEplants, with the complete protocol taking less than 2h on a desktop Linux box. The repeat library and the scripts to mask and annotate plant genomes can be obtained at .
doi:10.1101/2021.03.22.436504 fatcat:ifv7wmw3zbcqll6bm44n2ukt2i

Interface Similarity Improves Comparison of DNA-Binding Proteins: The Homeobox Example [chapter]

Álvaro Sebastián, Carlos P. Cantalapiedra, Bruno Contreras-Moreira
2012 Lecture Notes in Computer Science  
Cantalapiedra, and Bruno Contreras-Moreira Pairwise alignments of Homeobox domains Pairs of protein sequences from the validation set where aligned with the BLASTP program [8] .  ... 
doi:10.1007/978-3-642-28062-7_8 fatcat:taow7ed2pvgohg5kmp7gwe534u
« Previous Showing results 1 — 15 out of 689 results