Evaluating LTL Satisfiability Solvers [chapter]

Viktor Schuppan, Luthfi Darmawan
2011 Lecture Notes in Computer Science  
We perform a comprehensive experimental evaluation of offthe-shelf solvers for satisfiability of propositional LTL. We consider a wide range of solvers implementing three major classes of algorithms: reduction to model checking, tableau-based approaches, and temporal resolution. Our set of benchmark families is significantly more comprehensive than those in previous studies. It takes the benchmark families of previous studies, which only have a limited overlap, and adds benchmark families not
more » ... ed for that purpose before. We find that no solver dominates or solves all instances. Solvers focused on finding models and solvers using temporal resolution or fixed point computation show complementary strengths and weaknesses. This motivates and guides estimation of the potential of a portfolio solver. It turns out that even combining two solvers in a simple fashion significantly increases the share of solved instances while reducing CPU time spent. We have made our data available for further analysis [7]. 2. We consider number of solved instances, run time, memory usage, and model size. The analysis is greatly helped by using contour/discrete raw data plots, which complement the traditional cactus plots by preserving the relationship between benchmark instances. 3. The analysis shows complementary behavior between some solvers. This motivates estimating the potential of a portfolio solver. We consider portfolio solvers without communication between members of the portfolio for a best case scenario (which is unrealistic) and a reference case scenario (which any portfolio solver should aim to beat). Finally, we show that even a trivially implementable solver that sequentially executes one solver first for a short amount of time and, if necessary, then invokes another solver reduces the number of unsolved instances as well as the average run time. Related Work Rozier and Vardi compare several explicit state and symbolic BDD-based model checkers for LTL satisfiability checking [60] . They find the symbolic tools to be superior in terms of performance and, generally, also in terms of quality. They do not consider SAT-based bounded model checkers, tableau-based solvers, or temporal resolution. While they perform an in-depth comparison of solvers using very similar techniques, our focus is on comparing selected representatives of a broad variety of techniques. We also use more benchmark families and consider memory usage and model size. The same authors compare symbolic constructions of Büchi automata in [59] using the BDD-based engine of Cadence SMV as backend solver. They show that a portfolio approach to automata construction is advantageous. De Wulf et al. compare NuSMV and ALASKA [69]. For a detailed discussion see Sect. 6. Hustadt et al. perform several comparisons [45, 42, 46] of TRP, a version of LWB, and a version of SMV on the trp benchmark set (see Sect. 4). Goré and Widmann perform an experimental comparison of solvers for CTL [37] . Goranko et al. [35] compare an implementation of Wolper's tableau construction with pltl. For references on solver competitions and on their methodology see App. A of [62] . We are not aware of previous work on portfolio approaches to LTL satisfiability, except for [59] . We use entire solvers as members of a portfolio, while [59] uses different frontends for Büchi automata construction all relying on the same BDD-based backend solver. For other problem classes see, e.g., [43] (graph coloring, web browsing), [49] (winner determination problem), [34] (constraint satisfaction, mixed integer programming), [70] (SAT), or [58] (QBF). Organization In Sect. 2 we introduce notation. In Sect. 3, 4, and 5 we describe solvers, benchmarks, and methodology. Section 6 contains the results of our evaluation. An estimation of the potential of a portfolio solver follows in Sect. 7. Section 8 concludes. Due to space constraints the following parts are in appendices [62]: general concepts and terminology (App. A), details on our benchmark set (App. B), discussion (App. C), and some plots (App. D).
doi:10.1007/978-3-642-24372-1_28 fatcat:73zzxfweh5ep7nyqvxzsoeax7u