Lattice-based similarity measures between ordered trees
Journal of Mathematical Psychology
A clustering algorithm has recently been developed by Reitman and Rueter to express both the structure of chunking in multi-trial free recall and the order of chunk production. The resulting ordered trees differ from ordinary rooted trees in that the elements of a chunk, at any level, may be restricted to a specific ordering. In order to make comparisons of long-term memory structures between subjects, a measure of the similarity between trees is needed. Previously developed similarity measures
... are shown to be inadequate for ordered trees. Lattice theory is used to generate new similarity measures suited to these richer structures. First, ordered trees are shown to form a nonmodular, graded lattice. Then, moves through this lattice are defined and used to produce several distance measures. These new measures are compared both to each other, and to existing measures, by examining the properties of each measure, and through application to hypothetical trees. The lattice-based measures prove to be theoretically superior, but lack computational ease. The general problem of describing paths in a nonmodular lattice is discussed. 206 SIMILARITY OF ORDERED TREES 207 representations of psychological distance. By removing the root, Cunningham does not require hierarchical relationships. Instead, path lengths through the tree represent psychological distances. This formulation has also been discussed by Carroll (1976), and Sattath and Tversky (1977) , among others. In addition, free trees allow for both the terminal and nonterminal nodes to be labeled. further extends this representation to account for asymmetry by introducing bidirectional trees, allowing the length of the link from node a to b to be different from the length of b to a. Another type of hierarchical tree is the PQ-tree (Booth & Lueker, 1976) , which is used to represent classes of permutations that contain consecutive subsequences. PQtrees, while virtually unknown to psychology, are highly relevant to the fields of graph theory and computer science. They are rooted, bare trees which contain two types of nonterminal nodes: P-nodes in which the elements are permutable to any ordering, and Q-nodes in which the elements are permutable only to two orders, the given order and its inverse. The Reitman-Rueter (1980) clustering algorithm was created to explain the structure of chunks and the order of chunk production in multi-trial free recall. The algorithm results in an ordered tree which is identical to PQ trees, with an additional type of nonterminal node, one where the elements are fixed to a single order. Reitman and Rueter call P-nodes nondirectional, Q-nodes bidirectional, and the third class of nodes unidirectional. It is crucial to note that neither Reitman-Rueter ordered trees nor PQ-trees are based on an underlying similarity matrix of pairwise distances between items, but rather on the regularities of elements within a set of linear strings. The focus of this article is on the similarity between Reitman-Rueter ordered trees generated from the same recall set. Similarity between ordinary rooted trees (both valued and bare) has been addressed by Boorman and Olivier (1973) . Likewise, Cunningham (1980, Note 1) addresses similarity between free trees. However, as will be discussed below, neither of these approaches can adequately be applied to ordered trees. This paper presents a method for assessing the similarity of ordered trees.' The first section of this paper introduces Reitman-Rueter trees and defines some basic terminology. Having set forth some knowledge of Reitman-Rueter trees, section two reviews previous work on similarity between trees, and details why ordered trees are a special case. In section three, lattice theory is introduced to describe formally the relationships among Reitman-Rueter trees in fine detail. The lattice framework suggests several distance measures, which are then compared and contrasted in the final section. ' The approach taken here could also be applied to PQ-trees with only slight modifications in the height function and covering relationships defined later in this paper. However, I have chosen only to address ordered trees at this time.