Markov katana: a novel method for Bayesian resampling of parameter space applied to phylogenetic trees [article]

Stephen Pollard, Kenji Fukushima, Zhengyuan O Wang, Todd A Castoe, David Pollock
<span title="2018-01-25">2018</span> <i title="Cold Spring Harbor Laboratory"> bioRxiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Phylogenetic inference requires a means to search phylogenetic tree space. This is usually achieved using progressive algorithms that propose and test small alterations in the current tree topology and branch lengths. Current programs search tree topology space using branch-swapping algorithms, but proposals do not discriminate well between swaps likely to succeed or fail. When applied to datasets with many taxa, the huge number of possible topologies slows these programs dramatically. To
more &raquo; ... me this, we developed a novel statistical approach for proposal generation in Bayesian analysis, and evaluated its applicability for the problem of searching phylogenetic tree space. The general idea of the approach, which we call 'Markov katana', is to make proposals based on a heuristic algorithm using bootstrapped subsets of the data. Such proposals induce an unintended sampling distribution that must be determined and removed to generate posterior estimates, but the cost of this extra step can in principle be small compared to the added value of more efficient parameter exploration in Markov chain Monte Carlo analyses. Our prototype application uses the simple neighbor joining distance heuristic on data subsets to propose new reasonably likely phylogenetic trees (including topologies and branch lengths). The evolutionary model used to generate distances in our prototype was far simpler than the more complex model used to evaluate the likelihood of phylogenies based on the full dataset. This prototype implementation indicates that the Markov katana approach could be easily incorporated into existing phylogenetic search programs and may prove a useful alternative in conjunction with existing methods. The general features of this statistical approach may also prove useful in disciplines other than phylogenetics. We demonstrate that this method can be used to efficiently estimate a Bayesian posterior.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.1101/250951</a> <a target="_blank" rel="external noopener" href="">fatcat:6qcd3jiszbbu5akjhm4b6xjabu</a> </span>
