LEO-II and Satallax on the Sledgehammer test bench

Nik Sultana, Jasmin Christian Blanchette, Lawrence C. Paulson
2013 Journal of Applied Logic  
Sledgehammer is a tool that harnesses external first-order automatic theorem provers (ATPs) to discharge interactive proof obligations arising in Isabelle/HOL. We extended it with LEO-II and Satallax, the two most prominent higher-order ATPs, improving its performance on higher-order problems. To explore their usefulness, these ATPs are measured against first-order ATPs and built-in Isabelle tactics on a variety of benchmarks from Isabelle and the TPTP library. Sledgehammer provides an ideal
more » ... t bench for individual features of LEO-II and Satallax, revealing areas for improvements. This paper presents a double materialisation of this vision: an extension of Sledgehammer with LEO-II and Satallax as additional backends (Section 3). The extension reuses many components of Sledgehammer, including the parallel architecture and the relevance filter, but communicates with the ATPs in the higher-order language THF0. Although LEO-II, Satallax, and Isabelle all support "higher-order logic", the translation from Isabelle to THF0 is nontrivial because THF0 does not cater for polymorphic types and axiomatic type classes, which are ubiquitous in Isabelle formalisations. The integration is useful for proving goals where higher-order features predominate, as demonstrated by a few examples (Section 4). To ascertain more precisely the potential of LEO-II and Satallax, we let them compete on standard Isabelle benchmarks against first-order ATPs and built-in Isabelle tactics (Section 5.1). Although they are nowhere as powerful as the firstorder ATPs, they can occasionally solve problems that no other provers or tactics can solve. To make the evaluation more informative, the Isabelle problems are complemented by a subset of the TPTP library, which emphasises the higher-order aspects of the logic. By tuning Sledgehammer's translation, we carried out a fine-grained evaluation (Section 5.2) of the higher-order ATPs' handling of types and λ-abstractions (two problem features we would expect them to handle well) and large background theories. Sledgehammer then acts as a test bench for LEO-II and Satallax, suggesting avenues for improvements. Background This paper combines several technologies-TPTP, LEO-II, Satallax, Isabelle/HOL, and Sledgehammer-that are amply described elsewhere. This section briefly outlines them. TPTP Formats The TPTP infrastructure defines a hierarchy of languages [8, 35] . Of interest to us are the first-order form (FOF) for first-order logic with equality over untyped terms, the core typed firstorder form (TFF0) that extends FOF with simple types (sorts), and the core typed higher-order form (THF0) for higher-order logic. Ignoring minor syntactic differences, the strict inclusions FOF ⊂ TFF0 ⊂ THF0 hold. THF0 types are either type constants κ or the function type σ → τ, where σ and τ are arbitrary types. The types of propositions o and of individuals ι are predefined. The intended semantics of THF0 is Henkin semantics with extensionality and Hilbert choice. We take some liberties with the syntax, preferring traditional notations and omitting the apply operator @; thus, we write f X Y rather than f @ X @ Y for the application of X and Y to the curried function f. We do honour the TPTP convention that variable names start with an uppercase letter and constants with lowercase. The use of sans serif for constants further emphasises this distinction. LEO-II and Satallax The higher-order automatic provers LEO-II [7] and Satallax [3, 16] have THF0 as their input language. Both attempt to find a refutation from the negated conjecture and the axioms, amounting to a proof of the original conjecture. To improve their effectiveness, both provers implement strategy scheduling, which involves trying a sequence of option settings, each for a fraction of the allotted time. LEO-II implements a higher-order resolution calculus and periodically dispatches first-order subproblems to a first-order prover, usually E, with which it cooperates. LEO-II features several
doi:10.1016/j.jal.2012.12.002 fatcat:fsgwocqzgrgyjgbepkrqvrto6a