Reconstructing Trees When Sequence Sites Evolve at Variable Rates

1994 Journal of Computational Biology  
For a sequence of colors independently evolving on a tree under a simple Markov model, we consider conditions under which the tree can be uniquely recovered from the "sequence spectrum" -the expected frequencies of the various leaf colorations. This is relevant for phylogenetic analysis (where colors represent nucleotides or amino acids; leaves represent extant taxa) as the sequence spectrum is estimated directly from a collection of aligned sequences. Allowing the rate of the evolutionary
more » ... e evolutionary process to vary across sites is an important extension over most previous studies -we show that, given suitable restrictions on the rate distribution, the true tree (up to the placement of its root) is uniquely identified by its sequence spectrum. However, if the rate distribution is unknown and arbitrary, then, for simple models, it is possible for every tree to produce the same sequence spectrum. Hence there is a logical barrier to accurate, consistent phylogenetic inference for these models when assumptions about the rate distribution are not made. This result exploits a novel theorem on the action of polynomials with non-negative coefficients on sequences.
doi:10.1089/cmb.1994.1.153 pmid:8790461 fatcat:igoxohkt3betzc4yjib5xcegui