A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is
Analytical DNA ultracentrifugation revealed that eukaryotic genomes are mosaics of isochores: long DNA segments ( q 300 kb on average) relatively homogeneous in G 1 C. Important genome features are dependent on this isochore structure, e.g. genes are found predominantly in the GC-richest isochore classes. However, no reliable method is available to rigorously partition the genome sequence into relatively homogeneous regions of different composition, thereby revealing the isochore structure ofdoi:10.1016/s0378-1119(01)00641-2 pmid:11591471 fatcat:h5hgthldzrbidj5gxbqt2a76ki
more »... romosomes at the sequence level. Homogeneous regions are currently ascertained by plain statistics on moving windows of arbitrary length, or simply by eye on G 1 C plots. On the contrary, the entropic segmentation method is able to divide a DNA sequence into relatively homogeneous, statistically significant domains. An early version of this algorithm only produced domains having an average length far below the typical isochore size. Here we show that an improved segmentation method, specifically intended to determine the most statistically significant partition of the sequence at each scale, is able to identify the boundaries between long homogeneous genome regions displaying the typical features of isochores. The algorithm precisely locates classes II and III of the human major histocompatibility complex region, two well-characterized isochores at the sequence level, the boundary between them being the first isochore boundary experimentally characterized at the sequence level. The analysis is then extended to a collection of human large contigs. The relatively homogeneous regions we find show many of the features (G 1 C range, relative proportion of isochore classes, size distribution, and relationship with gene density) of the isochores identified through DNA centrifugation. Isochore chromosome maps, with many potential applications in genomics, are then drawn for all the completely sequenced eukaryotic genomes available. q
The sequencing of prokaryotic genomes covering a wide taxonomic range has sparked renewed interest in intrachromosomal compositional (GC) heterogeneity, largely in view of lateral transfers. We present here a brief overview of some methods for visualizing and quantifying GC variation in prokaryotes. We used these methods to examine heterogeneity levels in sequenced prokaryotes, for a range of scales or stringencies. Some species are consistently homogeneous, whereas others are markedlydoi:10.1016/j.gene.2004.02.042 pmid:15177687 fatcat:4q4qi4pribe2bgosadofwr3oo4
more »... eous in comparison, in particular Aeropyrum pernix, Xylella fastidiosa, Mycoplasma genitalium, Enterococcus faecalis, Bacillus subtilis, Pyrobaculum aerophilum, Vibrio vulnificus chromosome I, Deinococcus radiodurans chromosome II and Halobacterium. As we discuss here, the wide range of heterogeneities calls for reexamination of an accepted belief, namely that the endogenous DNA of bacteria and archaea should typically exhibit low intrachromosomal GC contrasts. Supplementary results for all species analyzed are available at our website: http://bioinfo2.ugr.es/prok.
The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness) has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. Results: The local variations in the scaling exponent of thedoi:10.1186/1471-2148-8-107 pmid:18405379 pmcid:PMC2397391 fatcat:hwibysbbj5hvrhg4zeqmcdj4dq
more »... ed Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short-and largescale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris), birds (Gallus gallus), fishes (Danio rerio), invertebrates (Drosophila melanogaster and Caenorhabditis elegans), plants (Arabidopsis thaliana) and yeasts (Saccharomyces cerevisiae). We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. Conclusion: Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.
Detrended Fluctuation Analysis (DFA) has become a standard method to quantify the correlations and scaling properties of real-world complex time series. For a given scale ℓ of observation, DFA provides the function F(ℓ), which quantifies the fluctuations of the time series around the local trend, which is substracted (detrended). If the time series exhibits scaling properties, then F(ℓ)∼ℓα asymptotically, and the scaling exponent α is typically estimated as the slope of a linear fitting in thedoi:10.3390/e24010061 pmid:35052087 pmcid:PMC8775092 fatcat:73pb5y5ygbgata55qdylb226fm
more »... ogF(ℓ) vs. log(ℓ) plot. In this way, α measures the strength of the correlations and characterizes the underlying dynamical system. However, in many cases, and especially in a physiological time series, the scaling behavior is different at short and long scales, resulting in logF(ℓ) vs. log(ℓ) plots with two different slopes, α1 at short scales and α2 at large scales of observation. These two exponents are usually associated with the existence of different mechanisms that work at distinct time scales acting on the underlying dynamical system. Here, however, and since the power-law behavior of F(ℓ) is asymptotic, we question the use of α1 to characterize the correlations at short scales. To this end, we show first that, even for artificial time series with perfect scaling, i.e., with a single exponent α valid for all scales, DFA provides an α1 value that systematically overestimates the true exponent α. In addition, second, when artificial time series with two different scaling exponents at short and large scales are considered, the α1 value provided by DFA not only can severely underestimate or overestimate the true short-scale exponent, but also depends on the value of the large scale exponent. This behavior should prevent the use of α1 to describe the scaling properties at short scales: if DFA is used in two time series with the same scaling behavior at short scales but very different scaling properties at large scales, very different values of α1 will be obtained, although the short scale properties are identical. These artifacts may lead to wrong interpretations when analyzing real-world time series: on the one hand, for time series with truly perfect scaling, the spurious value of α1 could lead to wrongly thinking that there exists some specific mechanism acting only at short time scales in the dynamical system. On the other hand, for time series with true different scaling at short and large scales, the incorrect α1 value would not characterize properly the short scale behavior of the dynamical system.
and Oliver 2003; Li, Bernaola-Galvan, Haghighi and Grosse 2002; Oliver, Bernaola-Galvan, Carpena and Roman-Roldan 2001; Oliver, Carpena, Hackenberg and Bernaola-Galvan 2004; Oliver, Carpena, Roman-Roldan ... versions of the algorithm (Bernaola-Galván, Román-Roldán and Oliver 1996; Oliver, Bernaola-Galvan, Carpena and Roman-Roldan 2001; Oliver, Carpena, Roman-Roldan, Mata-Balaguer, Mejias-Romero, Hackenberg ...arXiv:0806.1292v1 fatcat:uzvj7yklwfdejmuw6t2tdvnomy
24. Newsom, H. E. et al. The depletion of tungsten in the bulk silicate earth: Constraints on core formation.doi:10.1038/nature00948 pmid:12198542 fatcat:pgntudon3bfu3knviat3vyxl2a
The outputs of many real-world complex dynamical systems are time series characterized by power-law correlations and fractal properties. The first proposed model for such time series comprised fractional Gaussian noise (fGn), defined by an autocorrelation function C(k) with asymptotic power-law behavior, and a complicated power spectrum S(f) with power-law behavior in the small frequency region linked to the power-law behavior of C(k). This connection suggested the use of simpler models fordoi:10.3390/math10091416 fatcat:axw5nsegkvdrzctwyryautillm
more »... r-law correlated time series: time series with power spectra of the form S(f)∼1/fβ, i.e., with power-law behavior in the entire frequency range and not only near f=0 as fGn. This type of time series, known as 1/fβ noises or simply 1/f noises, can be simulated using the Fourier filtering method and has become a standard model for power-law correlated time series with a wide range of applications. However, despite the simplicity of the power spectrum of 1/fβ noises and of the known relationship between the power-law exponents of S(f) and C(k), to our knowledge, an explicit expression of C(k) for 1/fβ noises has not been previously published. In this work, we provide an analytical derivation of C(k) for 1/fβ noises, and we show the validity of our results by comparing them with the numerical results obtained from synthetically generated 1/fβ time series. We also present two applications of our results: First, we compare the autocorrelation functions of fGn and 1/fβ noises that, despite exhibiting similar power-law behavior, present some clear differences for anticorrelated cases. Secondly, we obtain the exact analytical expression of the Fluctuation Analysis algorithm when applied to 1/fβ noises.
The isochore concept in human genome sequence was challenged in an analysis by the International Human Genome Sequencing Consortium (IHGSC). We argue here that a statement in IGHSC analysis concerning the existence of isochore is incorrect, because it had applied an inappropriate statistical test. To test the existence of isochores should be equivalent to a test of homogeneity of windowed GC%. The statistical test applied in the IHGSC's analysis, the binomial test, is however a test of adoi:10.1016/s1476-9271(02)00090-7 pmid:12798034 fatcat:24digo234banxlqbrw6mysu7by
more »... e being random on the base level. For testing the existence of isochore, or homogeneity in GC%, we propose to use another statistical test: the analysis of variance (ANOVA). It can be shown that DNA sequences that are rejected by binomial test may not be rejected by the ANOVA test.
Physical Review E
The specific heat corresponding to systems with deterministic fractal energy spectra is known to present logarithmic-periodic oscillations as a function of the temperature T in the low T region around a mean value given by a characteristic dimension of the energy spectrum. In general, it is considered that the presence of disorder does not affect strongly these results, and that the fractal structure of the energy spectrum dominates. In this paper, we study the properties of the specific heatdoi:10.1103/physreve.73.016124 pmid:16486233 fatcat:k6iyocmnavepzdmumzhjgywmum
more »... rived from random fractal energy spectra as a function of the degree of disorder present in the spectra. To study the influence of the disorder, we analyze the specific heat using three different properties: the specific heat mean value and the periods and amplitudes of the oscillations of the specific heat around its mean value. By studying the distributions and the mean values of these three properties, we obtain that the disorder does not influence very much the mean value of the specific heat. However, concerning the behavior of periods and amplitudes, we obtain a critical value of the disorder present in the energy spectra. Below this critical value, we find a low effect of the disorder and quasideterministic behavior indicating that the fractal structure is the dominant effect, but above the critical value, the disorder dominates and the behavior of the specific heat is practically chaotic.
The details of the method will be given elsewhere (Carpena et al., 'Segmenting whole-genome methylation maps', in preparation), but in essence the algorithm maximizes the difference of the mean values ...doi:10.1093/nar/gkw996 pmid:27794041 pmcid:PMC5210667 fatcat:qa55agoltjgljh6bhotxum5sd4
Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content. Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A newdoi:10.1186/1471-2105-7-446 pmid:17038168 pmcid:PMC1617122 fatcat:qxgk4sjm6fby5oa6qe3zs6g4gy
more »... orithm (CpGcluster) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a p-value to each of these clusters, the most statistically significant ones can be predicted as CGIs. CpGcluster was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. CpGcluster reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, CpGcluster can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by CpGcluster present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). CpGcluster's CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying CpGcluster as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome. CpGcluster uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in CpGcluster is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands (neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for CpGcluster predictions.
The human genome is a mosaic of isochores, which are long DNA segments (≫300 kbp) relatively homogeneous in G+C. Human isochores were first identified by densitygradient ultracentrifugation of bulk DNA, and differ in important features, e.g. genes are found predominantly in the GC-richest isochores. Here, we use a reliable segmentation method to partition the longest contigs in the human genome draft sequence into long homogeneous genome regions (LHGRs), thereby revealing the isochore structuredoi:10.1016/s0378-1119(02)01034-x pmid:12468093 fatcat:356nv4a6evd5tf755szt4ab72e
more »... of the human genome. The advantages of the isochore maps presented here are: (1) sequence heterogeneities at different scales are shown in the same plot; (2) pair-wise compositional differences between adjacent regions are all statistically significant; (3) isochore boundaries are accurately defined to single base pair resolution; and (4) both gradual and abrupt isochore boundaries are simultaneously revealed. Taking advantage of the wide sample of genome sequence analyzed, we investigate the correspondence between LHGRs and true human isochores revealed through DNA centrifugation. LHGRs show many of the typical isochore features, mainly size distribution, G+C range, and proportions of the isochore classes. The relative density of genes, Alu and long interspersed nuclear element repeats and the different types of single nucleotide polymorphisms on LHGRs also coincide with expectations in true isochores. Potential applications of isochore maps range from the improvement of gene-finding algorithms to the prediction of linkage disequilibrium levels in association studies between marker genes and complex traits. The coordinates for the LHGRs identified in all the contigs
The detection and quantification of long-range correlations in time series is a fundamental tool to characterize the properties of different dynamical systems, and is applied in many different fields, including physics, biology or engineering. Due to the diversity of applications, many techniques for measuring correlations have been designed. Here, we study systematically the influence of the length of a time series on the results obtained from several techniques commonly used to detect anddoi:10.1007/s10867-005-3126-8 pmid:23345887 pmcid:PMC3482094 fatcat:ugimugzrnnglnc5kbg5ejy6e7a
more »... tify long-range correlations: the autocorrelation analysis, Hurst's analysis, and detrended fluctuation analysis (DFA). Using the Fourier filtering method, we generate artificial time series with known and controlled long-range correlations and with a broad range of lengths, and apply on them the different correlation measures we have studied. Our results indicate that while the DFA method is practically unaffected by the length of the time series, and almost always provides accurate results, the results from Hurst's analysis and the autocorrelation analysis strongly depend on the length of the time series.
Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results: We introduce here an algorithm to detect clustersdoi:10.1186/1748-7188-6-2 pmid:21261981 pmcid:PMC3037320 fatcat:epurfls7n5gvxbmtmmhpjndkja
more »... of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions: WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/ wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes.
The observable outputs of many complex dynamical systems consist in time series exhibiting autocorrelation functions of great diversity of behaviors, including long-range power-law autocorrelation functions, as a signature of interactions operating at many temporal or spatial scales. Often, algorithms able to generate correlated noises reproducing the properties of real time series produce Gaussian outputs, while real, experimentally observed time series are often non-Gaussian, and may followarXiv:1909.01725v1 fatcat:gnbhgy5phffz3j3lbiprdkkxhi
more »... stributions with a diversity of behaviors concerning the support, the symmetry or the tail properties. Here, we study how the correlation of two Gaussian variables changes when they are transformed to follow a different destination distribution. Specifically, we consider bounded and unbounded distributions, symmetric and non-symmetric distributions, and distributions with different tail properties, from decays faster than exponential to heavy tail cases including power-laws, and we find how these properties affect the correlation of the final variables. We extend these results to Gaussian time series which are transformed to have a different marginal distribution, and show how the autocorrelation function of the final non-Gaussian time series depends on the Gaussian correlations and on the final marginal distribution. As an application of our results, we propose how to generalize standard algorithms producing Gaussian power-law correlated time series in order to create synthetic time series with arbitrary distribution and controlled power-law correlations. Finally, we show a practical example of this algorithm by generating time series mimicking the marginal distribution and the power-law tail of the autocorrelation function of a real time series: the absolute returns of stock prices.
« Previous Showing results 1 — 15 out of 172 results