Scale-free duplication dynamics: A model for ultraduplication

M. V. Koroteev, J. Miller
2011 Physical Review E  
Empirical studies of the genome-wide length distribution of duplicated sequences have revealed an algebraic tail common to nearly all clades. The decay of the tail is often well-approximated by a single exponent that takes values within a limited range. We propose and study here scale-free duplication dynamics, a class of model for genome sequence evolution that generates the observed shapes of this distribution. A transition between self-similar and non-self similar regimes is exhibited. Our
more » ... del accounts plausibly for the observed form of the algebraic tail, which is not produced by standard models for generating long-range sequence correlations. PACS numbers: 05.50.+q, I. INTRODUCTION The field of comparative genomics -of pivotal importance to medicine, biotechnology and the basic biosciences -is in large part the game of inferring functionality from sequence conservation. Its premise is that selective adaptation acts on neutral (sequence) variation[1]. If for any given sequence, it can be established that its conservation among diverse species is improbable on neutral sequence variation alone, then negative selection on function of the given sequence is inferred de facto. This premise underlies the 'conservation tracks' at the UCSC genome browser, for example [2] . Consequently, the choice of model for neutral genome evolution can have a major impact on the computational inference of whether or not a sequence is functional. The discovery in the early 1990's of long-range algebraically decaying two-point base correlations (LRC) in natural genome sequences[3] received wide attention in the physics literature, wherein several models of neutral genome evolution were proposed to account for it. Over the years the Li expansion-modification model[4] seems to have achieved the greatest visibility, in part because it generates algebraic correlations (a non-local effect) via a local genome growth dynamics; values of the exponents for this model have been derived analytically [5] .
doi:10.1103/physreve.84.061919 pmid:22304128 fatcat:ywkc2yjqvncgrmczm4pz4pisru