Computing Quartet Distance is Equivalent to Counting 4-Cycles [article]

Bartłomiej Dudek, Paweł Gawrychowski
2020 arXiv   pre-print
The quartet distance is a measure of similarity used to compare two unrooted phylogenetic trees on the same set of n leaves, defined as the number of subsets of four leaves related by a different topology in both trees. After a series of previous results, Brodal et al. [SODA 2013] presented an algorithm that computes this number in 𝒪(ndlog n) time, where d is the maximum degree of a node. Our main contribution is a two-way reduction establishing that the complexity of computing the quartet
more » ... nce between two trees on n leaves is the same, up to polylogarithmic factors, as the complexity of counting 4-cycles in an undirected simple graph with m edges. The latter problem has been extensively studied, and the fastest known algorithm by Vassilevska Williams [SODA 2015] works in 𝒪(m^1.48) time. In fact, even for the seemingly simpler problem of detecting a 4-cycle, the best known algorithm works in 𝒪(m^4/3) time, and a conjecture of Yuster and Zwick implies that this might be optimal. In particular, an almost-linear time for computing the quartet distance would imply a surprisingly efficient algorithm for counting 4-cycles. In the other direction, by plugging in the state-of-the-art algorithms for counting 4-cycles, our reduction allows us to significantly decrease the complexity of computing the quartet distance. For trees with unbounded degrees we obtain an 𝒪(n^1.48) time algorithm, which is a substantial improvement on the previous bound of 𝒪(n^2log n). For trees with degrees bounded by d, by analysing the reduction more carefully, we are able to obtain an Õ(nd^0.77) time algorithm, which is again a nontrivial improvement on the previous bound of 𝒪(ndlog n).
arXiv:1811.06244v2 fatcat:kdd6gz5g6zcfrb6wnjx2ztgpim