A Variant of Higher-Order Anti-Unification

Alexander Baumgartner, Temur Kutsia, Jordi Levy, Mateu Villaret, Marc Herbstritt
2013 International Conference on Rewriting Techniques and Applications  
We present a rule-based Huet's style anti-unification algorithm for simply-typed lambda-terms in η-long β-normal form, which computes a least general higher-order pattern generalization. For a pair of arbitrary terms of the same type, such a generalization always exists and is unique modulo α-equivalence and variable renaming. The algorithm computes it in cubic time within linear space. It has been implemented and the code is freely available. higher-order patterns by permitting free variables
more » ... o apply to object terms, not only to bound variables. Object terms may contain constants, free variables, and variables which are bound outside of object terms. The algorithm has been implemented and was used for inductive generalization. Anti-unification in a restricted version of λ2 (a second-order λ-calculus with type variables [4]) has been studied in [23] with applications in analogical programming and analogical theorem proving. The imposed restrictions guarantee uniqueness of the least general generalization. This algorithm as well as the one for higher-order patterns by Pfenning [28] have influenced the generalization algorithm used in the program transformation technique called supercompilation [24] . There are other fragments of higher-order anti-unification, motivated by analogical reasoning. A restricted version of second-order generalization developed in [15] has an application in the replay of program derivations. A symbolic analogy model, called Heuristic-Driven Theory Projection, uses yet another restriction of higher-order anti-unification to detect analogies between different domains [18] . The last decade has seen a revived interest in anti-unification. The problem has been studied in various theories (e.g., [1, 2, 9, 19] ) and from different application points of view (e.g., [3, 8, 18, 23, 31, 22] ). A particularly interesting application comes from software code refactoring, to find similar pieces of code, e.g., in Python, Java [6, 7] and Erlang [22] programs. These approaches are based on the first-order anti-unification [29, 30] . To advance the refactoring and clone detection techniques for languages based on λProlog, one needs to employ anti-unification for higher-order terms. This potential application can serve as a motivation to look into the problem of higher-order anti-unification in more detail. In this paper, we revisit the problem of higher-order anti-unification, permit arbitrary terms as the input and require higher-order patterns in the output, and present an algorithm in the simply-typed setting. The main contributions can be briefly summarized as follows: 1. Designing a rule-based anti-unification algorithm in simply-typed λ-calculus (in Sect. 3). The input of the algorithm are arbitrary terms in η-long β-normal form. The output is a higher-order pattern. The formulation follows Huet's simple and elegant style [17] . The global function for recording disagreements is represented as a store, in the spirit of [1, 2]. 2. Proofs of the termination, soundness, and completeness properties of the anti-unification algorithm (in Sect. 4) and its subalgorithm, which computes permuting matchers between patterns (in Sect. 3.2). 3. Complexity analysis (in Sect. 4): The algorithm computes a least general pattern generalization, which always exists and is unique modulo α-equivalence, in cubic time and requires linear space. As it is done in related work, we assume that symbols and pointers are encoded in constant space, and basic operations on them performed in constant time. 4. Free open-source implementation for both simply-typed and untyped calculi (Sect. 5). Here we briefly compare our work with the existing results in higher-order anti-unification. The approaches which are closest to us are the following two: In [28] , Pfenning studied anti-unification in the Calculus of Construction, whose type system is richer than the simple types we consider. Both the input and the output was
doi:10.4230/lipics.rta.2013.113 dblp:conf/rta/BaumgartnerKLV13 fatcat:eve7aldrfnfk5dih3vnwctsya4