Fundamental performance constraints in horizontal fusion of in-order cores
High-Performance Computer Architecture
A conceptually appealing approach to supporting a broad range of workloads is a system comprising many small cores that can be fused, on demand, into larger cores. We demonstrate that using in-order cores for this purpose, even under idealized assumptions about fusion-related overheads, would introduce fundamental obstacles to achieving good performance -obstacles that are not present when out-oforder cores are used. Matching the performance of modern dynamically-scheduled designs demands that
... signs demands that a fused machine be able to simultaneously manage a large number of active dataflow chains, many more than the amount of ILP typically extracted from the code. When it is in-order cores that are fused, this requirement, in turn, demands either that the active dataflow chains be carefully interleaved among the available issue queues, or that enough cores be provided for them to reside at distinct queues. Using an abstract model for reasoning about the performance of these machines, we show that the former option is fundamentally hard, in the sense that it necessitates instruction steering hardware that would be too complex to build. The latter option would demand so many cores that the machine would be overwhelmed by fusion-related overheads. In short, if the goal is to match the performance of modern dynamically-scheduled machines, fusion of in-order cores is not a very compelling approach; either a fundamentally new method for fusing cores is needed, or some form of out-of-order capability must be provided at the constituent cores.