Dynamically managing the communication-parallelism trade-off in future clustered processors

Rajeev Balasubramonian, Sandhya Dwarkadas, David H. Albonesi
2003 Proceedings of the 30th annual international symposium on Computer architecture - ISCA '03  
Clustered microarchitectures are an attractive alternative to large monolithic superscalar designs due to their potential for higher clock rates in the face of increasingly wire-delay-constrained process technologies. As increasing transistor counts allow an increase in the number of clusters, thereby allowing more aggressive use of instructionlevel parallelism (ILP), the inter-cluster communication increases as data values get spread across a wider area. As a result of the emergence of this
more » ... de-off between communication and parallelism, a subset of the total on-chip clusters is optimal for performance. To match the hardware to the application's needs, we use a robust algorithm to dynamically tune the clustered architecture. The algorithm, which is based on program metrics gathered at periodic intervals, achieves an 11% performance improvement on average over the best statically defined architecture. We also show that the use of additional hardware and reconfiguration at basic block boundaries can achieve average improvements of 15%. Our results demonstrate that reconfiguration provides an effective solution to the communication and parallelism trade-off inherent in the communicationbound processors of the future. able option only if the IPC degradation does not offset the clock speed improvement. Modern processors like the Alpha 21264 [24] at ¥ § ¦¨ ©
doi:10.1145/859618.859650 fatcat:65mb7apno5ajxkwevbozfjbnki