Concerning the NJ algorithm and its unweighted version, UNJ [chapter]

Olivier Gascuel
1997 DIMACS Series in Discrete Mathematics and Theoretical Computer Science  
D = d ij , which should be as close as possible to T. Of course, there are several ways of defining the proximity between trees. In this paper, we will focus on the structure of the trees T and T , rather than on the length of their edges. This problem is encountered in domains where one tries to construct an inheritance phenomenon as, for example, the history of manuscripts in Archaeology (Buneman 1971), or the evolution of the species in Biology (Swofford et al. 1996) . The tree T represents
more » ... he history, the distances ( ) d ij represent the divergence times between these objects or these species, and the dissimilarities ( ) δ ij are estimates of these divergence times. A classical approach consists in following the least-squares criterion in constructing the positively-valued tree which best represents ∆ according to this criterion (Cunningham 1978; De Soete 1983; Roux 1988; Gascuel and Levy 1996) . Although simulation results are good (Kuhner and approach is not entirely satisfactory, since, to the best of our knowledge, very few results establish a link between the structure of the tree T thus inferred and the structure of the true tree T. This lacuna is partially filled by another approach, which is widely used in the domain of Evolution (Kidd and Sgaramella-Zonta 1971; Saitou and Nei 1987), and which is called the minimum evolution principle (ME). This principle consists in seeking among all the possible tree structures, that which leads to the "shortest" valued tree. The length of a valued tree is the sum of the lengths (valuations) of its edges, and within the ME principle, these are estimated using the least-squares criterion, without the positivity constraint. The structure of T being fixed, the length of the edges are thus obtained by minimizing the Euclidean distance between D and ∆. This minimization problem has a unique solution (described below, Section 2), and the length associated with a tree structure is thus well defined. Rzhetsky and Nei (1993) provide a thorough justification of the ME principle. They demonstrate that if the estimates ( ) δ ij are unbiased, i.e., ( ) ( ) E d ij ij δ = , then the structure of the true tree T has among all the possible tree structures, the shortest expected
doi:10.1090/dimacs/037/09 dblp:conf/dimacs/Gascuel96 fatcat:ku3oe2s6yre6bk2nlltfb7anu4