Architectural considerations for application-specific counterflow pipelines

B.R. Childers, J.W. Davidson
1999 Proceedings 20th Anniversary Conference on Advanced Research in VLSI  
Application-specific processor design is a promising approach for meeting the performance and cost goals of a system. Application-specific processors are especially promising for embedded systems (e.g., digital cameras, cellular phones, etc.) where a small increase in performance and decrease in cost can have a large impact on a product's viability. Sutherland, Sproull, and Molnar have proposed a new pipeline organization called the Counterflow Pipeline (CFP). This paper evaluates CFP design
more » ... ernatives and shows that the CFP is an ideal architecture for fast, low-cost design of high-performance processors customized for computation-intensive embedded applications. First, we describe why CFP's are particularly well-suited to realizing application-specific processors. Second, we describe how a CFP tailored to an application can be constructed automatically. Third, we present measurements that evaluate CFP design trade-offs and show that CFP's provide speculative and out-of-order execution, and register renaming that is matched to an application. Fourth, we show that asynchronous counterflow pipelines achieve high-performance by reducing the average execution latency of instructions over synchronous implementations. Finally, we demonstrate that custom CFP's achieve cycles per instruction measurements that are competitive with 4-way superscalar out-of-order processors at a potentially low design complexity. 1: Introduction Application-specific processor design is a promising approach for improving the cost-performance ratio of an application. Application-specific processors are especially useful for embedded systems (e.g., automobile control systems, avionics, cellular phones, etc.) where a small increase in performance and decrease in cost can have a large impact on a product's viability. An innovative computer organization called the Counterflow Pipeline (CFP), proposed by Sproull, Sutherland, and Molnar [27], has several characteristics that make it an ideal target organization for the synthesis of application-specific processors. The CFP has a simple and regular structure, local control, high degree of modularity, asynchronous implementations, and inherent handling of complex structures such as register renaming and speculative execution. Modern instruction-level parallel (ILP) processors must be able to tolerate high-latency operations and the frequent presence of control transfer operations. As an example, the 4-way superscalar HP PA-8000 microprocessor [17] tolerates a cache miss penalty of 50 clock cycles, which may cause the processor to stall for up to 200 instructions. To keep aggressive superscalar designs busy requires large instruction windows and other structures (e.g., register rename buff-
doi:10.1109/arvlsi.1999.756034 dblp:conf/arvlsi/ChildersD99 fatcat:sm2hu2vhrvghjeifj2ca3p7zem