172 Hits in 0.79 sec

Isochore chromosome maps of eukaryotic genomes

José L Oliver, Pedro Bernaola-Galván, Pedro Carpena, Ramón Román-Roldán
2001 Gene  
Analytical DNA ultracentrifugation revealed that eukaryotic genomes are mosaics of isochores: long DNA segments ( q 300 kb on average) relatively homogeneous in G 1 C. Important genome features are dependent on this isochore structure, e.g. genes are found predominantly in the GC-richest isochore classes. However, no reliable method is available to rigorously partition the genome sequence into relatively homogeneous regions of different composition, thereby revealing the isochore structure of
more » ... romosomes at the sequence level. Homogeneous regions are currently ascertained by plain statistics on moving windows of arbitrary length, or simply by eye on G 1 C plots. On the contrary, the entropic segmentation method is able to divide a DNA sequence into relatively homogeneous, statistically significant domains. An early version of this algorithm only produced domains having an average length far below the typical isochore size. Here we show that an improved segmentation method, specifically intended to determine the most statistically significant partition of the sequence at each scale, is able to identify the boundaries between long homogeneous genome regions displaying the typical features of isochores. The algorithm precisely locates classes II and III of the human major histocompatibility complex region, two well-characterized isochores at the sequence level, the boundary between them being the first isochore boundary experimentally characterized at the sequence level. The analysis is then extended to a collection of human large contigs. The relatively homogeneous regions we find show many of the features (G 1 C range, relative proportion of isochore classes, size distribution, and relationship with gene density) of the isochores identified through DNA centrifugation. Isochore chromosome maps, with many potential applications in genomics, are then drawn for all the completely sequenced eukaryotic genomes available. q
doi:10.1016/s0378-1119(01)00641-2 pmid:11591471 fatcat:h5hgthldzrbidj5gxbqt2a76ki

Quantifying intrachromosomal GC heterogeneity in prokaryotic genomes

Pedro Bernaola-Galván, José L Oliver, Pedro Carpena, Oliver Clay, Giorgio Bernardi
2004 Gene  
The sequencing of prokaryotic genomes covering a wide taxonomic range has sparked renewed interest in intrachromosomal compositional (GC) heterogeneity, largely in view of lateral transfers. We present here a brief overview of some methods for visualizing and quantifying GC variation in prokaryotes. We used these methods to examine heterogeneity levels in sequenced prokaryotes, for a range of scales or stringencies. Some species are consistently homogeneous, whereas others are markedly
more » ... eous in comparison, in particular Aeropyrum pernix, Xylella fastidiosa, Mycoplasma genitalium, Enterococcus faecalis, Bacillus subtilis, Pyrobaculum aerophilum, Vibrio vulnificus chromosome I, Deinococcus radiodurans chromosome II and Halobacterium. As we discuss here, the wide range of heterogeneities calls for reexamination of an accepted belief, namely that the endogenous DNA of bacteria and archaea should typically exhibit low intrachromosomal GC contrasts. Supplementary results for all species analyzed are available at our website:
doi:10.1016/j.gene.2004.02.042 pmid:15177687 fatcat:4q4qi4pribe2bgosadofwr3oo4

Phylogenetic distribution of large-scale genome patchiness

José L Oliver, Pedro Bernaola-Galván, Michael Hackenberg, Pedro Carpena
2008 BMC Evolutionary Biology  
The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness) has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. Results: The local variations in the scaling exponent of the
more » ... ed Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short-and largescale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris), birds (Gallus gallus), fishes (Danio rerio), invertebrates (Drosophila melanogaster and Caenorhabditis elegans), plants (Arabidopsis thaliana) and yeasts (Saccharomyces cerevisiae). We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. Conclusion: Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.
doi:10.1186/1471-2148-8-107 pmid:18405379 pmcid:PMC2397391 fatcat:hwibysbbj5hvrhg4zeqmcdj4dq

On the Validity of Detrended Fluctuation Analysis at Short Scales

Pedro Carpena, Manuel Gómez-Extremera, Pedro A. Bernaola-Galván
2021 Entropy  
Detrended Fluctuation Analysis (DFA) has become a standard method to quantify the correlations and scaling properties of real-world complex time series. For a given scale ℓ of observation, DFA provides the function F(ℓ), which quantifies the fluctuations of the time series around the local trend, which is substracted (detrended). If the time series exhibits scaling properties, then F(ℓ)∼ℓα asymptotically, and the scaling exponent α is typically estimated as the slope of a linear fitting in the
more » ... ogF(ℓ) vs. log(ℓ) plot. In this way, α measures the strength of the correlations and characterizes the underlying dynamical system. However, in many cases, and especially in a physiological time series, the scaling behavior is different at short and long scales, resulting in logF(ℓ) vs. log(ℓ) plots with two different slopes, α1 at short scales and α2 at large scales of observation. These two exponents are usually associated with the existence of different mechanisms that work at distinct time scales acting on the underlying dynamical system. Here, however, and since the power-law behavior of F(ℓ) is asymptotic, we question the use of α1 to characterize the correlations at short scales. To this end, we show first that, even for artificial time series with perfect scaling, i.e., with a single exponent α valid for all scales, DFA provides an α1 value that systematically overestimates the true exponent α. In addition, second, when artificial time series with two different scaling exponents at short and large scales are considered, the α1 value provided by DFA not only can severely underestimate or overestimate the true short-scale exponent, but also depends on the value of the large scale exponent. This behavior should prevent the use of α1 to describe the scaling properties at short scales: if DFA is used in two time series with the same scaling behavior at short scales but very different scaling properties at large scales, very different values of α1 will be obtained, although the short scale properties are identical. These artifacts may lead to wrong interpretations when analyzing real-world time series: on the one hand, for time series with truly perfect scaling, the spurious value of α1 could lead to wrongly thinking that there exists some specific mechanism acting only at short time scales in the dynamical system. On the other hand, for time series with true different scaling at short and large scales, the incorrect α1 value would not characterize properly the short scale behavior of the dynamical system.
doi:10.3390/e24010061 pmid:35052087 pmcid:PMC8775092 fatcat:73pb5y5ygbgata55qdylb226fm

A standalone version of IsoFinder for the computational prediction of isochores in genome sequences [article]

Pedro Bernaola-Galván, Pedro Carpena, José L. Oliver
2008 arXiv   pre-print
and Oliver 2003; Li, Bernaola-Galvan, Haghighi and Grosse 2002; Oliver, Bernaola-Galvan, Carpena and Roman-Roldan 2001; Oliver, Carpena, Hackenberg and Bernaola-Galvan 2004; Oliver, Carpena, Roman-Roldan  ...  versions of the algorithm (Bernaola-Galván, Román-Roldán and Oliver 1996; Oliver, Bernaola-Galvan, Carpena and Roman-Roldan 2001; Oliver, Carpena, Roman-Roldan, Mata-Balaguer, Mejias-Romero, Hackenberg  ... 
arXiv:0806.1292v1 fatcat:uzvj7yklwfdejmuw6t2tdvnomy

Erratum: Retraction Note to: Metal–insulator transition in chains with correlateddisorder

Pedro Carpena, Pedro Bernaola-Galván, Plamen Ch. Ivanov, H. Eugene Stanley
2002 Nature  
24. Newsom, H. E. et al. The depletion of tungsten in the bulk silicate earth: Constraints on core formation.
doi:10.1038/nature00948 pmid:12198542 fatcat:pgntudon3bfu3knviat3vyxl2a

On the Autocorrelation Function of 1/f Noises

Pedro Carpena, Ana V. Coronado
2022 Mathematics  
The outputs of many real-world complex dynamical systems are time series characterized by power-law correlations and fractal properties. The first proposed model for such time series comprised fractional Gaussian noise (fGn), defined by an autocorrelation function C(k) with asymptotic power-law behavior, and a complicated power spectrum S(f) with power-law behavior in the small frequency region linked to the power-law behavior of C(k). This connection suggested the use of simpler models for
more » ... r-law correlated time series: time series with power spectra of the form S(f)∼1/fβ, i.e., with power-law behavior in the entire frequency range and not only near f=0 as fGn. This type of time series, known as 1/fβ noises or simply 1/f noises, can be simulated using the Fourier filtering method and has become a standard model for power-law correlated time series with a wide range of applications. However, despite the simplicity of the power spectrum of 1/fβ noises and of the known relationship between the power-law exponents of S(f) and C(k), to our knowledge, an explicit expression of C(k) for 1/fβ noises has not been previously published. In this work, we provide an analytical derivation of C(k) for 1/fβ noises, and we show the validity of our results by comparing them with the numerical results obtained from synthetically generated 1/fβ time series. We also present two applications of our results: First, we compare the autocorrelation functions of fGn and 1/fβ noises that, despite exhibiting similar power-law behavior, present some clear differences for anticorrelated cases. Secondly, we obtain the exact analytical expression of the Fluctuation Analysis algorithm when applied to 1/fβ noises.
doi:10.3390/math10091416 fatcat:axw5nsegkvdrzctwyryautillm

Isochores merit the prefix 'iso'

Wentian Li, Pedro Bernaola-Galván, Pedro Carpena, Jose L Oliver
2003 Computational biology and chemistry  
The isochore concept in human genome sequence was challenged in an analysis by the International Human Genome Sequencing Consortium (IHGSC). We argue here that a statement in IGHSC analysis concerning the existence of isochore is incorrect, because it had applied an inappropriate statistical test. To test the existence of isochores should be equivalent to a test of homogeneity of windowed GC%. The statistical test applied in the IHGSC's analysis, the binomial test, is however a test of a
more » ... e being random on the base level. For testing the existence of isochore, or homogeneity in GC%, we propose to use another statistical test: the analysis of variance (ANOVA). It can be shown that DNA sequences that are rejected by binomial test may not be rejected by the ANOVA test.
doi:10.1016/s1476-9271(02)00090-7 pmid:12798034 fatcat:24digo234banxlqbrw6mysu7by

Specific heat of random fractal energy spectra

Ana V. Coronado, Pedro Carpena
2006 Physical Review E  
The specific heat corresponding to systems with deterministic fractal energy spectra is known to present logarithmic-periodic oscillations as a function of the temperature T in the low T region around a mean value given by a characteristic dimension of the energy spectrum. In general, it is considered that the presence of disorder does not affect strongly these results, and that the fractal structure of the energy spectrum dominates. In this paper, we study the properties of the specific heat
more » ... rived from random fractal energy spectra as a function of the degree of disorder present in the spectra. To study the influence of the disorder, we analyze the specific heat using three different properties: the specific heat mean value and the periods and amplitudes of the oscillations of the specific heat around its mean value. By studying the distributions and the mean values of these three properties, we obtain that the disorder does not influence very much the mean value of the specific heat. However, concerning the behavior of periods and amplitudes, we obtain a critical value of the disorder present in the energy spectra. Below this critical value, we find a low effect of the disorder and quasideterministic behavior indicating that the fractal structure is the dominant effect, but above the critical value, the disorder dominates and the behavior of the specific heat is practically chaotic.
doi:10.1103/physreve.73.016124 pmid:16486233 fatcat:k6iyocmnavepzdmumzhjgywmum

NGSmethDB 2017: enhanced methylomes and differential methylation

Ricardo Lebrón, Cristina Gómez-Martín, Pedro Carpena, Pedro Bernaola-Galván, Guillermo Barturen, Michael Hackenberg, José L. Oliver
2016 Nucleic Acids Research  
The details of the method will be given elsewhere (Carpena et al., 'Segmenting whole-genome methylation maps', in preparation), but in essence the algorithm maximizes the difference of the mean values  ... 
doi:10.1093/nar/gkw996 pmid:27794041 pmcid:PMC5210667 fatcat:qa55agoltjgljh6bhotxum5sd4

CpGcluster: a distance-based algorithm for CpG-island detection

Michael Hackenberg, Christopher Previti, Pedro Luis Luque-Escamilla, Pedro Carpena, José Martínez-Aroza, José L Oliver
2006 BMC Bioinformatics  
Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content. Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A new
more » ... orithm (CpGcluster) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a p-value to each of these clusters, the most statistically significant ones can be predicted as CGIs. CpGcluster was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. CpGcluster reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, CpGcluster can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by CpGcluster present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). CpGcluster's CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying CpGcluster as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome. CpGcluster uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in CpGcluster is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands (neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for CpGcluster predictions.
doi:10.1186/1471-2105-7-446 pmid:17038168 pmcid:PMC1617122 fatcat:qxgk4sjm6fby5oa6qe3zs6g4gy

Isochore chromosome maps of the human genome

José L Oliver, Pedro Carpena, Ramón Román-Roldán, Trinidad Mata-Balaguer, Andrés Mejı́as-Romero, Michael Hackenberg, Pedro Bernaola-Galván
2002 Gene  
The human genome is a mosaic of isochores, which are long DNA segments (≫300 kbp) relatively homogeneous in G+C. Human isochores were first identified by densitygradient ultracentrifugation of bulk DNA, and differ in important features, e.g. genes are found predominantly in the GC-richest isochores. Here, we use a reliable segmentation method to partition the longest contigs in the human genome draft sequence into long homogeneous genome regions (LHGRs), thereby revealing the isochore structure
more » ... of the human genome. The advantages of the isochore maps presented here are: (1) sequence heterogeneities at different scales are shown in the same plot; (2) pair-wise compositional differences between adjacent regions are all statistically significant; (3) isochore boundaries are accurately defined to single base pair resolution; and (4) both gradual and abrupt isochore boundaries are simultaneously revealed. Taking advantage of the wide sample of genome sequence analyzed, we investigate the correspondence between LHGRs and true human isochores revealed through DNA centrifugation. LHGRs show many of the typical isochore features, mainly size distribution, G+C range, and proportions of the isochore classes. The relative density of genes, Alu and long interspersed nuclear element repeats and the different types of single nucleotide polymorphisms on LHGRs also coincide with expectations in true isochores. Potential applications of isochore maps range from the improvement of gene-finding algorithms to the prediction of linkage disequilibrium levels in association studies between marker genes and complex traits. The coordinates for the LHGRs identified in all the contigs
doi:10.1016/s0378-1119(02)01034-x pmid:12468093 fatcat:356nv4a6evd5tf755szt4ab72e

Size Effects on Correlation Measures

Ana V. Coronado, Pedro Carpena
2005 Journal of biological physics (Print)  
The detection and quantification of long-range correlations in time series is a fundamental tool to characterize the properties of different dynamical systems, and is applied in many different fields, including physics, biology or engineering. Due to the diversity of applications, many techniques for measuring correlations have been designed. Here, we study systematically the influence of the length of a time series on the results obtained from several techniques commonly used to detect and
more » ... tify long-range correlations: the autocorrelation analysis, Hurst's analysis, and detrended fluctuation analysis (DFA). Using the Fourier filtering method, we generate artificial time series with known and controlled long-range correlations and with a broad range of lengths, and apply on them the different correlation measures we have studied. Our results indicate that while the DFA method is practically unaffected by the length of the time series, and almost always provides accurate results, the results from Hurst's analysis and the autocorrelation analysis strongly depend on the length of the time series.
doi:10.1007/s10867-005-3126-8 pmid:23345887 pmcid:PMC3482094 fatcat:ugimugzrnnglnc5kbg5ejy6e7a

WordCluster: detecting clusters of DNA words and genomic elements

Michael Hackenberg, Pedro Carpena, Pedro Bernaola-Galván, Guillermo Barturen, Ángel M Alganza, José L Oliver
2011 Algorithms for Molecular Biology  
Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results: We introduce here an algorithm to detect clusters
more » ... of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions: WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes.
doi:10.1186/1748-7188-6-2 pmid:21261981 pmcid:PMC3037320 fatcat:epurfls7n5gvxbmtmmhpjndkja

Transforming Gaussian correlations. Applications to generating long-range power-law correlated time series with arbitrary distribution [article]

Pedro Carpena, Pedro A. Bernaola-Galván, Manuel Gómez-Extremera, Ana V. Coronado
2019 arXiv   pre-print
The observable outputs of many complex dynamical systems consist in time series exhibiting autocorrelation functions of great diversity of behaviors, including long-range power-law autocorrelation functions, as a signature of interactions operating at many temporal or spatial scales. Often, algorithms able to generate correlated noises reproducing the properties of real time series produce Gaussian outputs, while real, experimentally observed time series are often non-Gaussian, and may follow
more » ... stributions with a diversity of behaviors concerning the support, the symmetry or the tail properties. Here, we study how the correlation of two Gaussian variables changes when they are transformed to follow a different destination distribution. Specifically, we consider bounded and unbounded distributions, symmetric and non-symmetric distributions, and distributions with different tail properties, from decays faster than exponential to heavy tail cases including power-laws, and we find how these properties affect the correlation of the final variables. We extend these results to Gaussian time series which are transformed to have a different marginal distribution, and show how the autocorrelation function of the final non-Gaussian time series depends on the Gaussian correlations and on the final marginal distribution. As an application of our results, we propose how to generalize standard algorithms producing Gaussian power-law correlated time series in order to create synthetic time series with arbitrary distribution and controlled power-law correlations. Finally, we show a practical example of this algorithm by generating time series mimicking the marginal distribution and the power-law tail of the autocorrelation function of a real time series: the absolute returns of stock prices.
arXiv:1909.01725v1 fatcat:gnbhgy5phffz3j3lbiprdkkxhi
« Previous Showing results 1 — 15 out of 172 results