Analysis of benchmark characteristics and benchmark performance prediction

Rafael H. Saavedra, Alan J. Smith
1996 ACM Transactions on Computer Systems  
Standard benchmarking provides the run times for given programs on given machines, but fails to provide insight as to why those results were obtained (either in terms of machine or program characteristics), and fails to provide run times for that program on some other machine, or some other programs on that machine. We have developed a machineindependent model of program execution to characterize both machine performance and program execution. By merging these machine and program
more » ... ns, we can estimate execution time for arbitrary machine/program combinations. Our technique allows us to identify those operations, either on the machine or in the programs, which dominate the benchmark results. This information helps designers in improving the performance of future machines, and users in tuning their applications to better utilize the performance of existing machines. Here we apply our methodology to characterize benchmarks and predict their execution times. We present extensive run-time statistics for a large set of benchmarks including the SPEC and Perfect Club suites. We show how these statistics can be used to identify important shortcomings in the programs. In addition, we give execution time estimates for a large sample of programs and machines and compare these against benchmark results. Finally, we develop a metric for program similarity that makes it possible to classify benchmarks with respect to a large set of characteristics. PROGRAM STATISTICS FOR THE TRFD BENCHMARK ON THE IBM RS /6000 530: Lines processed -> from 1 to 485 [485] mnem operation times-executed fraction execution-time fraction [arsl] add (002) exec: 7 (0.0000) time: 0.000001 (0.0000) [sisl] store (015) exec: 6583752 (0.0043) time: 0.000000 (0.0000) [aisl] add (016) exec: 9497124 (0.0062) time: 1.292559 (0.0036) [misl] mult (017) exec: 196 (0.0000) time: 0.000031 (0.0000) [disl] divide (018) exec: 210 (0.0000) time: 0.000198 (0.0000) [tisl] trans (021) exec: 101949 (0.0001) time: 0.012071 (0.0000) [srdl] store (022) exec: 216205010 (0.1416) time: 2.832286 (0.0079) [ardl] add (023) exec: 215396153 (0.1411) time: 23.090467 (0.0642) [mrdl] mult (024) exec: 214742010 (0.1406) time: 22.504963 (0.0626) [drdl] divide (025) exec: 735371 (0.0005) time: 0.563588 (0.0016) [erdl] exp-i (026) exec: 28 (0.0000) time: 0.000002 (0.0000) [trdl] trans (028) exec: 18545814 (0.0121) time: 1.743307 (0.0048) [sisg] store (043) exec: 175 (0.0000) time: 0.000000 (0.0000) [aisg] add (044) exec: 730303 (0.0005) time: 0.110495 (0.0003) [misg] mult (045) exec: 35 (0.0000) time: 0.000005 (0.0000) [tisg] trans (049) exec: 9 (0.0000) time: 0.000003 (0.0000) [andl] and-or (057) exec: 1 (0.0000) time: 0.000000 (0.0000) [cisl] i-sin (060) exec: 1514464 (0.0010) time: 0.426170 (0.0012) [crdl] r-dou (061) exec: 6723500 (0.0044) time: 2.989268 (0.0083) [crdg] r-dou (066) exec: 2 (0.0000) time: 0.000001 (0.0000) [proc] proc (067) exec: 5289 (0.0000) time: 0.001074 (0.0000) [argl] argums (068) exec: 5394 (0.0000) time: 0.001101 (0.0000) [arr1] in:1-s (071) exec: 166300304 (0.1089) time: 33.060501 (0.0919) [arr2] in:2-s (072) exec: 499858800 (0.3274) time: 204.792156 (0.5696) [loin] do-ini (076) exec: 7474649 (0.0049) time: 1.456062 (0.0040) [loov] do-lop (077) exec: 162509732 (0.1064) time: 64.678873 (0.1799) [loix] do-ini (078) exec: 1 (0.0000) time: 0.000002 (0.0000) [loox] do-lop (079) exec: 7 (0.0000) time: 0.000004 (0.0000) Predicted execution time = 359.555187 secs any of the machines that makes its overall execution time larger than expected [Saav90]. Related Work Several papers have proposed different approaches to execution time prediction, with significant differences in their degrees of accuracy and applicability. These attempts have ranged from using simple Markov Chain models [Rama65, Beiz70] to more complex approaches that involve solving a set of recursive performance equations [Hick88]. Here we mention three proposals that are somewhat related to our concept of an abstract machine model and the use of static and dynamic program statistics. One way to compare machines is to do an analysis similar to ours, but at the level of the machine instruction set [Peut77] . This approach only permits comparisons between machines which implement the same instruction set. In the context of the PTRAN project [Alle87], execution time prediction has been proposed as a technique to help in the automatic partitioning of parallel programs into tasks. In [Sark89] , execution profiles are obtained indirectly by collecting statistics on all the loops of a possible unstructured program, and then combining that with analysis of the control dependence graph. In [Bala91] a prototype of a static performance estimator which could be used by a parallel compiler to guide data partitioning decisions is presented. These performance estimates are computed from machine measurements obtained using a set of routines called the training set. The training set is similar to our machine characterizer. In addition to the basic CPU measurements, the training set also contains tests to measure the performance of communication primitives in a loosely synchronous distributed memory machine. The compiler then makes a static analysis of the program and combines this information with data produced by the training set. A prototype of the performance estimator has been implemented in the ParaScope interactive parallel programming environment [Bala89]. In contrast to our execution time predictions, the compiler does not incorporate dynamic program information; the user must supply the lower and upper bounds of symbolic variables used for do loops, and branching probabilities for if-then statements (or use the default probabilities provided by the compiler.)
doi:10.1145/235543.235545 fatcat:arruclh4c5dshk2ktttxqzxbzu