A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is `application/pdf`

.

## Filters

##
###
Isochore chromosome maps of eukaryotic genomes

2001
*
Gene
*

Analytical DNA ultracentrifugation revealed that eukaryotic genomes are mosaics of isochores: long DNA segments ( q 300 kb on average) relatively homogeneous in G 1 C. Important genome features are dependent on this isochore structure, e.g. genes are found predominantly in the GC-richest isochore classes. However, no reliable method is available to rigorously partition the genome sequence into relatively homogeneous regions of different composition, thereby revealing the isochore structure of

doi:10.1016/s0378-1119(01)00641-2
pmid:11591471
fatcat:h5hgthldzrbidj5gxbqt2a76ki
## more »

... romosomes at the sequence level. Homogeneous regions are currently ascertained by plain statistics on moving windows of arbitrary length, or simply by eye on G 1 C plots. On the contrary, the entropic segmentation method is able to divide a DNA sequence into relatively homogeneous, statistically significant domains. An early version of this algorithm only produced domains having an average length far below the typical isochore size. Here we show that an improved segmentation method, specifically intended to determine the most statistically significant partition of the sequence at each scale, is able to identify the boundaries between long homogeneous genome regions displaying the typical features of isochores. The algorithm precisely locates classes II and III of the human major histocompatibility complex region, two well-characterized isochores at the sequence level, the boundary between them being the first isochore boundary experimentally characterized at the sequence level. The analysis is then extended to a collection of human large contigs. The relatively homogeneous regions we find show many of the features (G 1 C range, relative proportion of isochore classes, size distribution, and relationship with gene density) of the isochores identified through DNA centrifugation. Isochore chromosome maps, with many potential applications in genomics, are then drawn for all the completely sequenced eukaryotic genomes available. q##
###
Quantifying intrachromosomal GC heterogeneity in prokaryotic genomes

2004
*
Gene
*

The sequencing of prokaryotic genomes covering a wide taxonomic range has sparked renewed interest in intrachromosomal compositional (GC) heterogeneity, largely in view of lateral transfers. We present here a brief overview of some methods for visualizing and quantifying GC variation in prokaryotes. We used these methods to examine heterogeneity levels in sequenced prokaryotes, for a range of scales or stringencies. Some species are consistently homogeneous, whereas others are markedly

doi:10.1016/j.gene.2004.02.042
pmid:15177687
fatcat:4q4qi4pribe2bgosadofwr3oo4
## more »

... eous in comparison, in particular Aeropyrum pernix, Xylella fastidiosa, Mycoplasma genitalium, Enterococcus faecalis, Bacillus subtilis, Pyrobaculum aerophilum, Vibrio vulnificus chromosome I, Deinococcus radiodurans chromosome II and Halobacterium. As we discuss here, the wide range of heterogeneities calls for reexamination of an accepted belief, namely that the endogenous DNA of bacteria and archaea should typically exhibit low intrachromosomal GC contrasts. Supplementary results for all species analyzed are available at our website: http://bioinfo2.ugr.es/prok.##
###
Phylogenetic distribution of large-scale genome patchiness

2008
*
BMC Evolutionary Biology
*

The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness) has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. Results: The local variations in the scaling exponent of the

doi:10.1186/1471-2148-8-107
pmid:18405379
pmcid:PMC2397391
fatcat:hwibysbbj5hvrhg4zeqmcdj4dq
## more »

... ed Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short-and largescale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris), birds (Gallus gallus), fishes (Danio rerio), invertebrates (Drosophila melanogaster and Caenorhabditis elegans), plants (Arabidopsis thaliana) and yeasts (Saccharomyces cerevisiae). We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. Conclusion: Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.##
###
On the Validity of Detrended Fluctuation Analysis at Short Scales

2021
*
Entropy
*

Detrended Fluctuation Analysis (DFA) has become a standard method to quantify the correlations and scaling properties of real-world complex time series. For a given scale ℓ of observation, DFA provides the function F(ℓ), which quantifies the fluctuations of the time series around the local trend, which is substracted (detrended). If the time series exhibits scaling properties, then F(ℓ)∼ℓα asymptotically, and the scaling exponent α is typically estimated as the slope of a linear fitting in the

doi:10.3390/e24010061
pmid:35052087
pmcid:PMC8775092
fatcat:73pb5y5ygbgata55qdylb226fm
## more »

... ogF(ℓ) vs. log(ℓ) plot. In this way, α measures the strength of the correlations and characterizes the underlying dynamical system. However, in many cases, and especially in a physiological time series, the scaling behavior is different at short and long scales, resulting in logF(ℓ) vs. log(ℓ) plots with two different slopes, α1 at short scales and α2 at large scales of observation. These two exponents are usually associated with the existence of different mechanisms that work at distinct time scales acting on the underlying dynamical system. Here, however, and since the power-law behavior of F(ℓ) is asymptotic, we question the use of α1 to characterize the correlations at short scales. To this end, we show first that, even for artificial time series with perfect scaling, i.e., with a single exponent α valid for all scales, DFA provides an α1 value that systematically overestimates the true exponent α. In addition, second, when artificial time series with two different scaling exponents at short and large scales are considered, the α1 value provided by DFA not only can severely underestimate or overestimate the true short-scale exponent, but also depends on the value of the large scale exponent. This behavior should prevent the use of α1 to describe the scaling properties at short scales: if DFA is used in two time series with the same scaling behavior at short scales but very different scaling properties at large scales, very different values of α1 will be obtained, although the short scale properties are identical. These artifacts may lead to wrong interpretations when analyzing real-world time series: on the one hand, for time series with truly perfect scaling, the spurious value of α1 could lead to wrongly thinking that there exists some specific mechanism acting only at short time scales in the dynamical system. On the other hand, for time series with true different scaling at short and large scales, the incorrect α1 value would not characterize properly the short scale behavior of the dynamical system.##
###
A standalone version of IsoFinder for the computational prediction of isochores in genome sequences
[article]

2008
*
arXiv
*
pre-print

and Oliver 2003; Li, Bernaola-Galvan, Haghighi and Grosse 2002; Oliver, Bernaola-Galvan,

arXiv:0806.1292v1
fatcat:uzvj7yklwfdejmuw6t2tdvnomy
*Carpena*and Roman-Roldan 2001; Oliver,*Carpena*, Hackenberg and Bernaola-Galvan 2004; Oliver,*Carpena*, Roman-Roldan ... versions of the algorithm (Bernaola-Galván, Román-Roldán and Oliver 1996; Oliver, Bernaola-Galvan,*Carpena*and Roman-Roldan 2001; Oliver,*Carpena*, Roman-Roldan, Mata-Balaguer, Mejias-Romero, Hackenberg ...##
###
Erratum: Retraction Note to: Metal–insulator transition in chains with correlateddisorder

2002
*
Nature
*

24. Newsom, H. E. et al. The depletion of tungsten in the bulk silicate earth: Constraints on core formation.

doi:10.1038/nature00948
pmid:12198542
fatcat:pgntudon3bfu3knviat3vyxl2a
##
###
On the Autocorrelation Function of 1/f Noises

2022
*
Mathematics
*

The outputs of many real-world complex dynamical systems are time series characterized by power-law correlations and fractal properties. The first proposed model for such time series comprised fractional Gaussian noise (fGn), defined by an autocorrelation function C(k) with asymptotic power-law behavior, and a complicated power spectrum S(f) with power-law behavior in the small frequency region linked to the power-law behavior of C(k). This connection suggested the use of simpler models for

doi:10.3390/math10091416
fatcat:axw5nsegkvdrzctwyryautillm
## more »

... r-law correlated time series: time series with power spectra of the form S(f)∼1/fβ, i.e., with power-law behavior in the entire frequency range and not only near f=0 as fGn. This type of time series, known as 1/fβ noises or simply 1/f noises, can be simulated using the Fourier filtering method and has become a standard model for power-law correlated time series with a wide range of applications. However, despite the simplicity of the power spectrum of 1/fβ noises and of the known relationship between the power-law exponents of S(f) and C(k), to our knowledge, an explicit expression of C(k) for 1/fβ noises has not been previously published. In this work, we provide an analytical derivation of C(k) for 1/fβ noises, and we show the validity of our results by comparing them with the numerical results obtained from synthetically generated 1/fβ time series. We also present two applications of our results: First, we compare the autocorrelation functions of fGn and 1/fβ noises that, despite exhibiting similar power-law behavior, present some clear differences for anticorrelated cases. Secondly, we obtain the exact analytical expression of the Fluctuation Analysis algorithm when applied to 1/fβ noises.##
###
Isochores merit the prefix 'iso'

2003
*
Computational biology and chemistry
*

The isochore concept in human genome sequence was challenged in an analysis by the International Human Genome Sequencing Consortium (IHGSC). We argue here that a statement in IGHSC analysis concerning the existence of isochore is incorrect, because it had applied an inappropriate statistical test. To test the existence of isochores should be equivalent to a test of homogeneity of windowed GC%. The statistical test applied in the IHGSC's analysis, the binomial test, is however a test of a

doi:10.1016/s1476-9271(02)00090-7
pmid:12798034
fatcat:24digo234banxlqbrw6mysu7by
## more »

... e being random on the base level. For testing the existence of isochore, or homogeneity in GC%, we propose to use another statistical test: the analysis of variance (ANOVA). It can be shown that DNA sequences that are rejected by binomial test may not be rejected by the ANOVA test.##
###
Specific heat of random fractal energy spectra

2006
*
Physical Review E
*

The specific heat corresponding to systems with deterministic fractal energy spectra is known to present logarithmic-periodic oscillations as a function of the temperature T in the low T region around a mean value given by a characteristic dimension of the energy spectrum. In general, it is considered that the presence of disorder does not affect strongly these results, and that the fractal structure of the energy spectrum dominates. In this paper, we study the properties of the specific heat

doi:10.1103/physreve.73.016124
pmid:16486233
fatcat:k6iyocmnavepzdmumzhjgywmum
## more »

... rived from random fractal energy spectra as a function of the degree of disorder present in the spectra. To study the influence of the disorder, we analyze the specific heat using three different properties: the specific heat mean value and the periods and amplitudes of the oscillations of the specific heat around its mean value. By studying the distributions and the mean values of these three properties, we obtain that the disorder does not influence very much the mean value of the specific heat. However, concerning the behavior of periods and amplitudes, we obtain a critical value of the disorder present in the energy spectra. Below this critical value, we find a low effect of the disorder and quasideterministic behavior indicating that the fractal structure is the dominant effect, but above the critical value, the disorder dominates and the behavior of the specific heat is practically chaotic.##
###
NGSmethDB 2017: enhanced methylomes and differential methylation

2016
*
Nucleic Acids Research
*

The details of the method will be given elsewhere (

doi:10.1093/nar/gkw996
pmid:27794041
pmcid:PMC5210667
fatcat:qa55agoltjgljh6bhotxum5sd4
*Carpena*et al., 'Segmenting whole-genome methylation maps', in preparation), but in essence the algorithm maximizes the difference of the mean values ...##
###
CpGcluster: a distance-based algorithm for CpG-island detection

2006
*
BMC Bioinformatics
*

Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content. Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A new

doi:10.1186/1471-2105-7-446
pmid:17038168
pmcid:PMC1617122
fatcat:qxgk4sjm6fby5oa6qe3zs6g4gy
## more »

... orithm (CpGcluster) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a p-value to each of these clusters, the most statistically significant ones can be predicted as CGIs. CpGcluster was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. CpGcluster reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, CpGcluster can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by CpGcluster present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). CpGcluster's CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying CpGcluster as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome. CpGcluster uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in CpGcluster is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands (neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for CpGcluster predictions.##
###
Isochore chromosome maps of the human genome

2002
*
Gene
*

The human genome is a mosaic of isochores, which are long DNA segments (≫300 kbp) relatively homogeneous in G+C. Human isochores were first identified by densitygradient ultracentrifugation of bulk DNA, and differ in important features, e.g. genes are found predominantly in the GC-richest isochores. Here, we use a reliable segmentation method to partition the longest contigs in the human genome draft sequence into long homogeneous genome regions (LHGRs), thereby revealing the isochore structure

doi:10.1016/s0378-1119(02)01034-x
pmid:12468093
fatcat:356nv4a6evd5tf755szt4ab72e
## more »

... of the human genome. The advantages of the isochore maps presented here are: (1) sequence heterogeneities at different scales are shown in the same plot; (2) pair-wise compositional differences between adjacent regions are all statistically significant; (3) isochore boundaries are accurately defined to single base pair resolution; and (4) both gradual and abrupt isochore boundaries are simultaneously revealed. Taking advantage of the wide sample of genome sequence analyzed, we investigate the correspondence between LHGRs and true human isochores revealed through DNA centrifugation. LHGRs show many of the typical isochore features, mainly size distribution, G+C range, and proportions of the isochore classes. The relative density of genes, Alu and long interspersed nuclear element repeats and the different types of single nucleotide polymorphisms on LHGRs also coincide with expectations in true isochores. Potential applications of isochore maps range from the improvement of gene-finding algorithms to the prediction of linkage disequilibrium levels in association studies between marker genes and complex traits. The coordinates for the LHGRs identified in all the contigs##
###
Size Effects on Correlation Measures

2005
*
Journal of biological physics (Print)
*

The detection and quantification of long-range correlations in time series is a fundamental tool to characterize the properties of different dynamical systems, and is applied in many different fields, including physics, biology or engineering. Due to the diversity of applications, many techniques for measuring correlations have been designed. Here, we study systematically the influence of the length of a time series on the results obtained from several techniques commonly used to detect and

doi:10.1007/s10867-005-3126-8
pmid:23345887
pmcid:PMC3482094
fatcat:ugimugzrnnglnc5kbg5ejy6e7a
## more »

... tify long-range correlations: the autocorrelation analysis, Hurst's analysis, and detrended fluctuation analysis (DFA). Using the Fourier filtering method, we generate artificial time series with known and controlled long-range correlations and with a broad range of lengths, and apply on them the different correlation measures we have studied. Our results indicate that while the DFA method is practically unaffected by the length of the time series, and almost always provides accurate results, the results from Hurst's analysis and the autocorrelation analysis strongly depend on the length of the time series.##
###
WordCluster: detecting clusters of DNA words and genomic elements

2011
*
Algorithms for Molecular Biology
*

Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results: We introduce here an algorithm to detect clusters

doi:10.1186/1748-7188-6-2
pmid:21261981
pmcid:PMC3037320
fatcat:epurfls7n5gvxbmtmmhpjndkja
## more »

... of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions: WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/ wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes.##
###
Transforming Gaussian correlations. Applications to generating long-range power-law correlated time series with arbitrary distribution
[article]

2019
*
arXiv
*
pre-print

The observable outputs of many complex dynamical systems consist in time series exhibiting autocorrelation functions of great diversity of behaviors, including long-range power-law autocorrelation functions, as a signature of interactions operating at many temporal or spatial scales. Often, algorithms able to generate correlated noises reproducing the properties of real time series produce Gaussian outputs, while real, experimentally observed time series are often non-Gaussian, and may follow

arXiv:1909.01725v1
fatcat:gnbhgy5phffz3j3lbiprdkkxhi
## more »

... stributions with a diversity of behaviors concerning the support, the symmetry or the tail properties. Here, we study how the correlation of two Gaussian variables changes when they are transformed to follow a different destination distribution. Specifically, we consider bounded and unbounded distributions, symmetric and non-symmetric distributions, and distributions with different tail properties, from decays faster than exponential to heavy tail cases including power-laws, and we find how these properties affect the correlation of the final variables. We extend these results to Gaussian time series which are transformed to have a different marginal distribution, and show how the autocorrelation function of the final non-Gaussian time series depends on the Gaussian correlations and on the final marginal distribution. As an application of our results, we propose how to generalize standard algorithms producing Gaussian power-law correlated time series in order to create synthetic time series with arbitrary distribution and controlled power-law correlations. Finally, we show a practical example of this algorithm by generating time series mimicking the marginal distribution and the power-law tail of the autocorrelation function of a real time series: the absolute returns of stock prices.
« Previous

*Showing results 1 — 15 out of 172 results*