Instruction set extensions for photonic synchronous coalesced accesses

Paul Keltcher, David Whelihan, Jeffrey Hughes
2013 2013 IEEE High Performance Extreme Computing Conference (HPEC)  
Microprocessors have evolved over the last fortyplus years from purely sequential single operation machines, to pipelined super-scalar, to threaded and SIMD, and finally to multi-core and massive multi-core/thread machines. Despite these advances, the conceptual model programmers use to program them is still that of a single threaded register file bound math unit that can only be loosely synchronized with other such processors. This lack of explicit synchrony, caused by limitations of metal
more » ... rconnect, limits parallel efficiency. Recent advances in silicon photonic-enabled architectures [1, 5, 7] promise to greatly enable high synchrony over long distances (centimeters or more). In this paper, it is shown that global synchrony changes the way computers can be programmed by introducing a new class of ISA level instruction: the globally-synchronous load-store. In the context of multiple load-store machines, the globally synchronous load-store architecture allows the programmer to think about a collection of independent load-store machines as a single load-store machine. This operation is described, and its ISA implications explored in the context of the distributed matrix transpose, which exhibits a high degree of data non-locality, and is difficult to efficiently parallelize on modern architectures.
doi:10.1109/hpec.2013.6670326 dblp:conf/hpec/KeltcherWH13 fatcat:y7fki3y375fsvpdumbgldwy4ze