Hardware Support for Accelerating Data Movement in Server Platform

Li Zhao, Laxmi N. Bhuyan, Ravi Iyer, Srihari Makineni, Donald Newell
2007 IEEE transactions on computers  
Data movement (memory copies) is a very common operation during network processing and application execution on servers. The performance of this operation is rather poor on today's microprocessors due to the following aspects: 1) Several longlatency memory accesses are involved because the source and/or the destination are typically in memory, 2) latency hiding techniques, such as out-of-order execution, hardware threading, and prefetching, are not very effective for bulk data movement, and 3)
more » ... icroprocessors move data at register (small) granularity. In this paper, we show this overhead of bulk data movement and propose the use of dedicated copy engines to minimize it. We present a detailed analysis of copy engine architectures along two dimensions: 1) on-die versus off-die and 2) synchronous versus asynchronous. These copy engine architectures are superior to traditional Direct Memory Access (DMA) engines because they are tightly coupled to the core architecture and enable lower overhead communication and signaling. We describe the hardware support required to implement these copy engines and integrate them into server platforms. We perform a detailed case study to evaluate the performance of these copy engines. The evaluation is based on an execution-driven simulator, which was extended with detailed models of copy engines. Our simulation results show that copy engines are effective in reducing the bulk data movement overhead and, hence, hold significant promise for high-performance server platforms.
doi:10.1109/tc.2007.1036 fatcat:dgkpxmdpybfrzl2vyepg6ke7ra