Limits of instruction-level parallelism

David W. Wall
1991 ACM SIGOPS Operating Systems Review  
The Western Research Laboratory (WRL) is a computer systems research group that was founded by Digital Equipment Corporation in 1982. Our focus is computer science research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There two other research laboratories located in Palo Alto, the Network Systems Laboratory
more » ... NSL) and the Systems Research Center (SRC). Other Digital research groups are located in Paris (PRL) and in Cambridge, Massachusetts (CRL). Our research is directed towards mainstream high-performance computer systems. Our prototypes are intended to foreshadow the future computing environments used by many Digital customers. The long-term goal of WRL is to aid and accelerate the development of high-performance uni-and multi-processors. The research projects within WRL will address various aspects of high-performance computing. We believe that significant advances in computer systems do not come from any single technological advance. Technologies, both hardware and software, do not all advance at the same pace. System design is the art of composing systems which use each level of technology in an appropriate balance. A major advance in overall system performance will require reexamination of all aspects of the system. We do work in the design, fabrication and packaging of hardware; language processing and scaling issues in system software design; and the exploration of new applications areas that are opening up with the advent of higher performance systems. Researchers at WRL cooperate closely and move freely among the various levels of system design. This allows us to explore a wide range of tradeoffs to meet system goals. Abstract Growing interest in ambitious multiple-issue machines and heavilypipelined machines requires a careful examination of how much instructionlevel parallelism exists in typical programs. Such an examination is complicated by the wide variety of hardware and software techniques for increasing the parallelism that can be exploited, including branch prediction, register renaming, and alias analysis. By performing simulations based on instruction traces, we can model techniques at the limits of feasibility and even beyond. This paper presents the results of simulations of 18 different test programs under 375 different models of available parallelism analysis. This paper replaces Technical Note TN-15, an earlier version of the same material. i Author's note Three years ago I published some preliminary results of a simulation-based study of instructionlevel parallelism [Wall91]. It took advantage of a fast instruction-level simulator and a computing environment in which I could use three or four dozen machines with performance in the 20-30 MIPS range every night for many weeks. But the space of parallelism techniques to be explored is very large, and that study only scratched the surface. The report you are reading now is an attempt to fill some of the cracks, both by simulating more intermediate models and by considering a few ideas the original study did not consider. I believe it is by far the most extensive study of its kind, requiring almost three machine-years and simulating in excess of 1 trillion instructions. The original paper generated many different opinions 1 . Some looked at the high parallelism available from very ambitious (some might say unrealistic) models and proclaimed the millennium. My own opinion was pessimistic: I looked at how many different things you have to get right, including things this study doesn't address at all, and despaired. Since then I have moderated that opinion somewhat, but I still consider the negative results of this study to be at least as important as the positive. This study produced far too many numbers to present them all in the text and graphs, so the complete results are available only in the appendix. I have tried not to editorialize in the selection of which results to present in detail, but a careful study of the numbers in the appendix may well reward the obsessive reader. In the three years since the preliminary paper appeared, multiple-issue architectures have changed from interesting idea to revealed truth, though little hard data is available even now. I hope the results in this paper will be helpful. It must be emphasized, however, that they should be treated as guideposts and not mandates. When one contemplates a new architecture, there is no substitute for simulations that include real pipeline details, a likely memory configuration, and a much larger program suite than a study like this one can include. 1 Probably exactly as many opinions as there were before it appeared.
doi:10.1145/106974.106991 fatcat:n4frf6asgzezlgxvxf5p4d3r3e