Decisive aspects in the evolution of microprocessors
Proceedings of the IEEE
The incessant market demand for higher and higher processor performance called for a continuous increase of clock frequencies as well as an impressive evolution of the microarchitecture. In this paper, we focus on the latter, highlighting major microarchitectural improvements that were introduced to more effectively utilize instruction level parallelism (ILP) in commercial performance-oriented microprocessors. We will show that designers increased the throughput of the microarchitecture at the
... LP level basically by subsequently introducing temporal, issue, and intrainstruction parallelism in such a way that exploiting parallelism along one dimension compelled to introduce parallelism along a new dimension as well to further increase performance. In addition, each basic technique used to implement parallel operation along a certain dimension inevitably caused processing bottlenecks in the microarchitecture, whose elimination gave birth to the introduction of innovative auxiliary techniques. On the other hand, the auxiliary techniques applied allow the basic technique of parallel operation to reach its limits, evoking the debut of a new dimension of parallel operation in the microarchitecture. The sequence of basic and auxiliary techniques coined to increase the efficiency of microarchitectures constitutes a fascinating framework for the evolution of microarchitectures, as presented in our paper. Keywords-Instruction level parallelism (ILP), intrainstruction parallelism, issue parallelism, microarchitecture, processor performance, temporal parallelism. 1 We note that computer manufacturers typically offer three product groups: 1) expensive high-performance models designed as servers and workstations; 2) basic models emphasizing both cost and performance; and finally 3) low-cost (value) models emphasizing cost over performance. For instance, Intel's Xeon line exemplifies high-performance models, the company's Klamath, Deschutes, Katmai, Coppermine, and Pentium4 (Willamette and Northwood) cores represent basic models, whereas their Celeron processors are low-cost (value) models. High-performance models are obviously expensive, since all processor and system components must provide a high enough throughput, whereas low cost systems save cost by using less ambitious and less expensive parts or subsystems. 2 In order to avoid a large number of multiple references to superscalar processors in the text and in the figures, we give all references to superscalars only in Fig. 28 .