13,691 Hits in 6.6 sec

Information theory applications for biological sequence analysis

S. Vinga
2013 Briefings in Bioinformatics  
analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles.  ...  Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology.  ...  She also acknowledges the reviewers' comments and suggestions that greatly improved this review.  ... 
doi:10.1093/bib/bbt068 pmid:24058049 fatcat:ecuz4m4iezdgpdlkegz5zggpyq

Physical complexity of symbolic sequences

C. Adami, N.J. Cerf
2000 Physica D : Non-linear phenomena  
Thus, the physical complexity measures the amount of information about the environment that is coded in the sequence, and is conditional on such an environment.  ...  This physical complexity can be estimated for ensembles of sequences, for which it reverts to the difference between the maximal entropy of the ensemble and the actual entropy given the specific environment  ...  Acknowledgements This work was supported by the National Science Foundation under Grant no. PHY-9723972. We are indebted to Tom Schneider for pointing out Ref. [19] , and to C. Ofria and W.H.  ... 
doi:10.1016/s0167-2789(99)00179-7 fatcat:3qwwwwrgxjb7xf6y5edhqbafze

Training-free Measures Based on Algorithmic Probability Identify High Nucleosome Occupancy in DNA Sequences [article]

Hector Zenil, Peter Minary
2018 arXiv   pre-print
We test our measures on well-studied genomic sequences of different sizes drawn from different sources.  ...  We introduce and study a set of training-free methods of information-theoretic and algorithmic complexity nature applied to DNA sequences to identify their potential capabilities to determine nucleosomal  ...  Nucleosome location is an ideal test case to probe how informative sequence-based indices of complexity can be in determining structural (and thus some functional) properties of genomic DNA, and how much  ... 
arXiv:1708.01751v3 fatcat:terca5zg2jaqlmnh2ahz2gnm6y

Sequence complexity in Darwinian evolution

Christoph Adami
2002 Complexity  
Physical complexity is a measure based on automata theory and information theory that turns out to be a simple and intuitive measure of the amount of information that an organism stores, in its genome,  ...  It can be shown that the physical complexity of the genomes of clonal organisms must increase in evolution, if they occupy a single niche and if the environment does not change.  ...  ACKNOWLEDGMENTS I am grateful to Murray Gell-Mann for explaining to me the relationship between physical complexity and his effective complexity, and to Charles Ofria and Travis Collier for collaboration  ... 
doi:10.1002/cplx.10071 fatcat:dcr45xhsnndqxdbdtgjmf6ampu

Complexity Estimation of Genetic Sequences Using Information-Theoretic and Frequency Analysis Methods

Robertas Damaševičius
2010 Informatica  
In this paper, the complexity of genetic sequences is estimated using Shannon entropy, Rényi entropy and relative Kolmogorov complexity.  ...  The structural complexity based on periodicities is analyzed using the autocorrelation function and time delayed mutual information.  ...  However, the information-theoretic methods do not measure the structural organization of the sequence.  ... 
doi:10.15388/informatica.2010.270 fatcat:slc2qqwyhfae7m2xbsoa7tfqim

DNA Sequences at a Glance

Armando J. Pinho, Sara P. Garcia, Diogo Pratas, Paulo J. S. G. Ferreira, Cynthia Gibas
2013 PLoS ONE  
In this paper we present a new concept, the "information profile", which provides a quantitative measure of the local complexity of a DNA sequence, independently of the direction of processing.  ...  We also describe a tool to compute the information profiles of a given DNA sequence, and use the genome of the fission yeast Schizosaccharomyces pombe strain 972 h 2 and five human chromosomes 22 for illustration  ...  Conceived and designed the experiments: AJP SPG DP PJSGF. Performed the experiments: SPG DP. Analyzed the data: AJP SPG. Wrote the paper: AJP SPG DP PJSGF.  ... 
doi:10.1371/journal.pone.0079922 pmid:24278218 pmcid:PMC3836782 fatcat:uecoikdq25d65odo2zngijfk7a

Information Analysis of DNA Sequences [article]

Riyazuddin Mohammed
2010 arXiv   pre-print
In this paper, we consider entropy as a measure of information by modifying the entropy expression to take into account the varying length of these sequences.  ...  The problem of differentiating the informational content of coding (exons) and non-coding (introns) regions of a DNA sequence is one of the central problems of genomics.  ...  Motivation The complexity and information carrying capacity of DNA data makes genomic sequence analysis an attractive research area today.  ... 
arXiv:1010.4205v1 fatcat:bom32sxatzfq3js53kauc44gaa

Information Measure for Long-Range Correlated Sequences: the Case of the 24 Human Chromosomes

A. Carbone
2013 Scientific Reports  
A new approach to estimate the Shannon entropy of a long-range correlated sequence is proposed.  ...  From the information theory standpoint, this means that the power-law correlated clusters carry the same information of the whole analysed sequence.  ...  structure by statistical methods.  ... 
doi:10.1038/srep02721 pmid:24056670 pmcid:PMC3779848 fatcat:ywcjae7z35bdtakdxlj5joxq7q

Statistics of local complexity in amino acid sequences and sequence databases

John C. Wootton, Scott Federhen
1993 Computers and Chemistry  
The definitions are:--(l) those derived from enumeration (I priori by a treatment analogous to statistical mechanics, (2) a log likelihood definition of complexity analogous to informational entropy, (  ...  3) multinomial probabilities of observed compositions, (4) an approximation resembling the x2 statistic and (5) a modification of the coefficient of divergence.  ...  The horizontal axes are the same for all the plots and represent position in sequence.  ... 
doi:10.1016/0097-8485(93)85006-x fatcat:w6y2bvfc2zgenlebljx2vqk4su

Training-free measures based on algorithmic probability identify high nucleosome occupancy in DNA sequences

2019 Nucleic Acids Research  
We test the measures on well-studied genomic sequences of different sizes drawn from different sources.  ...  We introduce and study a set of training-free methods of an information-theoretic and algorithmic complexity nature that we apply to DNA sequences to identify their potential to identify nucleosomal binding  ...  Nucleosome location is an ideal test case to probe how informative sequence-based indices of complexity can be in determining structural (and thus some functional) properties of genomic DNA, and how much  ... 
doi:10.1093/nar/gkz750 pmid:31511887 pmcid:PMC6846163 fatcat:7wydzyy62nf3njwqjauy4oyx2u

Information theory as a model of genomic sequences [chapter]

Chengpeng Bi, Peter K. Rogan
2005 Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics  
Lin, J. (1991) Divergence measure based on the Shannon entropy. IEEE Trans. on Info. Theror., 37, 145-151.  ...  Information theory has been applied to the analysis of DNA and protein sequences in several ways: (1) by analyzing sequence complexity from the Shannon-Weaver indices of smaller DNA windows contained in  ...  Thomas Schneider for valuable suggestions and comments. Related Articles  ... 
doi:10.1002/047001153x.g402204 fatcat:q3ikvpgcxrd6lahywz26ixw6be

Similarity of symbolic sequences [article]

B. Kozarzewski
2011 arXiv   pre-print
The new similarity measure works well for short (of tens letters) sequences and the very long (of hundred thousand letters) as well.  ...  A new numerical characterization of symbolic sequences is proposed. The partition of sequence based on Ke and Tong algorithm is a starting point.  ...  One of the first quantitative measure complexity of symbolic sequences has been provided by Lempel and Ziv [1] .  ... 
arXiv:1108.1979v2 fatcat:kwgurbmzojg7bk4jos4hy3sfyi

Signal detection in genome sequences using complexity based features

Mehdi Kargar, Aijun An, Nick Cercone, Kayvan Tirdad, Morteza Zihayat
2013 Proceedings of the 12th International Workshop on Data Mining in Bioinformatics - BioKDD '13  
In this work, we tackle the problem of evaluating complexity methods and measures for finding interesting signals in the whole genome of three prokaryotic organisms.  ...  Also, we investigate whether positions and lengths of windows in ORFs have significant impact on distinguishing between genes and pseudo-genes.  ...  The complexity of sequences is useful in reproducing phylogenetic trees, compacting biological sequences, identifying the genomic structures and studying genomic evolution [9] .  ... 
doi:10.1145/2500863.2500867 dblp:conf/kdd/KargarACTZ13 fatcat:vr7u6byuinaxtfebkinal72aum

Segmenting DNA sequence into `words' [article]

Wang Liang
2013 arXiv   pre-print
Firstly, we find the length of most DNA 'words' is 12 to 15 bps by analyzing the genomes of 12 model species. Then we design an unsupervised probability based approach to segment the DNA sequences.  ...  This paper presents a novel method to segment/decode DNA sequences based on n-grams statistical language model.  ...  In terms of n-grams analysis, perplexity is a measure of the average branching factor and can be used to measure how well an n-gram predicts the next juncture type in the test set.  ... 
arXiv:1202.2518v4 fatcat:t75yrqheaff3bcklo44j333alm

Protein structure prediction from sequence variation

Debora S Marks, Thomas A Hopf, Chris Sander
2012 Nature Biotechnology  
Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods.  ...  Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins.  ...  of positions, say position 15 and 67, in any one sequence, summing over all sequences in the multiple-sequence alignment.  ... 
doi:10.1038/nbt.2419 pmid:23138306 pmcid:PMC4319528 fatcat:rkbhf4on7bcuvcabgdyghtrchi
« Previous Showing results 1 — 15 out of 13,691 results