Off-chip communication architectures for high throughput network processors

Jacob Engel, Taskin Kocak
2009 Computer Communications  
In this work, we present off-chip communications architectures for line cards to increase the throughput of the currently used memory system. In recent years there is a significant increase in memory bandwidth demand on line cards as a result of higher line rates, an increase in deep packet inspection operations and an unstoppable expansion in lookup tables. As line-rate data and NPU processing power increase, memory access time becomes the main system bottleneck during data store/retrieve
more » ... tions. The growing demand for memory bandwidth contrasts the notion of indirect interconnect methodologies. Moreover, solutions to the memory bandwidth bottleneck are limited by physical constraints such as area and NPU I/O pins. Therefore, indirect interconnects are replaced with direct, packet-based networks such as mesh, torus or k-ary n-cubes. We investigate multiple k-ary n-cube based interconnects and propose two variations of 2-ary 3-cube interconnect called the 3D-bus and 3D-mesh. All of the k-ary n-cube interconnects include multiple, highly efficient techniques to route, switch, and control packet flows in order to minimize congestion spots and packet loss. We explore the tradeoffs between implementation constraints and performance. We also developed an event-driven, interconnect simulation framework to evaluate the performance of packet-based off-chip k-ary n-cube interconnect architectures for line cards. The simulator uses the state-of-the-art software design techniques to provide the user with a flexible yet robust tool, that can emulate multiple interconnect architectures under non-uniform traffic patterns. Moreover, the simulator offers the user with full control over network parameters, performance enhancing features and iii simulation time frames that make the platform as identical as possible to the real line card physical and functional properties. By using our network simulator, we reveal the best processor-memory configuration, out of multiple configurations, that achieves optimal performance. Moreover, we explore how network enhancement techniques such as virtual channels and sub-channeling improve network latency and throughput. Our performance results show that k-ary n-cube topologies, and especially our modified version of 2-ary 3-cube interconnect -the 3D-mesh, significantly outperform existing line card interconnects and are able to sustain higher traffic loads. The flow control mechanism proved to extensively reduce hot-spots, load-balance areas of high traffic rate and achieve low transmission failure rate. Moreover, it can scale to adopt more memories and/or processors and as a result to increase the line card's processing power. iv To my parents, v ACKNOWLEDGMENTS
doi:10.1016/j.comcom.2008.12.043 fatcat:n44wcdfjgjhzji5tzqgx2abo3q