Reconciling performance and programmability in networking systems
Computer communication review
Challenges in addressing the memory bottleneck have made it difficult to design a packet processing platform that simultaneously achieves both ease-of-programming and high performance. Today's commercial processors support two architectural mechanisms-namely, hardware multithreading and caching-to overcome the memory bottleneck. The configurations of these mechanisms (e.g., cache capacity, number of threads per processor core) are fixed at processordesign time. The relative effectiveness of
... e mechanisms, however, varies significantly with application, traffic, and system characteristics. Thus, programmers often struggle to achieve high performance from a processor that is not well-suited to a particular deployment. To address this challenge, we first make a case for, and then develop a malleable processor architecture that facilitates the dynamic reconfiguration of cache capacity and number of threads to best-suit the needs of each deployment. We then present an algorithm that can determine the optimal thread-cache balance at run-time. The combination of these two allows us to simultaneously achieve the goals of ease-of-programming and high performance. We demonstrate that our processor outperforms a processor similar to Intel's IXP2800-a state-of-the-art commercial Network Processor-in about 89% of the deployments we consider. Further, in about 30% of the deployments our platform improves the throughput by as much as 300%.