Proceedings of the 37th annual international symposium on Computer architecture - ISCA '10
Chip Multiprocessors (CMPs) are now commodity hardware, but commoditization of parallel software remains elusive. In the near term, the current trend of increased coreper-socket count will continue, despite a lack of parallel software to exercise the hardware. Future CMPs must deliver thread-level parallelism when software provides threads to run, but must also continue to deliver performance gains for single threads by exploiting instructionlevel parallelism and memory-level parallelism.
... r, power limitations will prevent conventional cores from exploiting both simultaneously. This work presents the Forwardflow Architecture, which can scale its execution logic up to run single threads, or down to run multiple threads in a CMP. Forwardflow dynamically builds an explicit internal dataflow representation from a conventional instruction set architecture, using forward dependence pointers to guide instruction wakeup, selection, and issue. Forwardflow's backend is organized into discrete units that can be individually (de-)activated, allowing each core's performance to be scaled by system software at the architectural level. On single threads, Forwardflow core scaling yields a mean runtime reduction of 21% for a 37% increase in power consumption. For multithreaded workloads, a Forwardflowbased CMP allows system software to select the performance point that best matches available power.