Soft MOUSETRAP: A Bundled-Data Asynchronous Pipeline Scheme Tolerant to Random Variations at Ultra-Low Supply Voltages

Jian Liu, Steven M. Nowick, Mingoo Seok
2013 2013 IEEE 19th International Symposium on Asynchronous Circuits and Systems  
OVERVIEW My main research is on asynchronous and mixed-timing digital design. Asynchronous circuits have no centralized or global clock. Instead, they are distributed hardware systems where multiple components coordinate and synchronize at their own rate on communication channels. As chips grow increasing larger and faster, power and design-time requirements become more aggressive, and timing variability becomes a critical factor, there are increasing challenges in assembling
more » ... d synchronous systems. My key goal is to make asynchronous digital design a viable option. Asynchronous design has the potential to offer significant improvements in performance, energy, reliability and scalability, since it eliminates the rigidity and overhead of the fixed-rate clock, and allow flexible and distributed assembly and communication of components. In particular, it can provide low power (components activated only on-demand, without the need to instrument clock gating, and entirely eliminating the global clock); high performance (some asynchronous systems have significantly lower latency and increased average throughput, rather than be bound to a worst-case clock rate); great robustness to timing variability and unpredictability; and modularity and composability. There is also a recent surge of interest in industry in hybrid designs, which connect standard synchronous components (e.g. processors, memories) through flexible asynchronous interconnection networks, forming globally-asynchronous locally-synchronous (GALS) systems, where the asynchronous network provides a scalable and reliable integration medium. Experimental Results. A post-layout evaluation of the new switch design, in comparison with the synchronous xpipesLite implementation, demonstrated: a reduction in overall power of 85%/73% (vs. synchronous without/with clock gating), a 71% reduction in switch area, and a 44% reduction in average energy/flit, while maintaining nearly comparable throughput (903 ps/cycle) in a 45nm low power technology. Work remains to be done on improving and better controlling the tool flow, and on reducing overheads in link-level pipelining. For VC's, we demonstrated that a replicated switch with distinct VC control on links is the best solution. The above designs are almost entirely standard-cell based, making them practical for commercial application. Our solution provides a unique direct comparison with a state-of-the-art synchronous design (xpipesLite), and demonstrates significant overall cost benefits -including highlighting that high-performance asynchronous designs can have significantly lower area than synchronous designs, and can provide much lower average power even than synchronous clock-gated designs.
doi:10.1109/async.2013.29 dblp:conf/async/LiuNS13 fatcat:pgi4on5mbffj3fylrygmhghspy