Taming the IXP network processor

Lal George, Matthias Blume
2003 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation - PLDI '03  
We compile Nova, a new language designed for writing network processing applications, using a back end based on integer-linear programming (ILP) for register allocation, optimal bank assignment, and spills. The compiler's optimizer employs CPS as its intermediate representation; some of the invariants that this IR guarantees are essential for the formulation of a practical ILP model. Appel and George used a similar ILP-based technique for the IA32 to decide which variables reside in registers
more » ... t deferred the actual assignment of colors to a later phase. We demonstrate how to carry over their idea to an architecture with many more banks, register aggregates, variables with multiple simultaneous register assignments, and, very importantly, one where bank-and registerassignment cannot be done in isolation from each other. Our approach performs well in practise-without causing an explosion in size or solve time of the generated integer linear programs. A Mux B Mux SDRAM Memory 256 Mb Store Transfers Transfers SRAM Memory 8Mb Load Store Transfers Transfers 8 SDRAM 8 SRAM 8 SDRAM 8 SRAM Load L 16 16 GPR GPR LD B A SD S Figure 1: Micro-engine architecture when processing data at gigabit line rates, network processors such as the Intel IXP employ fairly unusual designs which make it hard to write programs for them. Our work is an attempt to address this problem. We have focused on the IXP1200, but all of the ideas carry over to newer generations of the architecture. The IXP1200 consists of a StrongARM core and six micro-engines with hardware supported multi-threading. Figure 1 shows the basic architecture as seen from the vantage point of a single micro-engine thread. There are six register banks: two general purpose banks (A and B); two banks forming the interface to external SRAM memory (L and S); and two as the interface to external SDRAM memory (LD and SD). The L and LD transfer banks are the destinations for all memory loads, S and SD the sources of all stores. Input to the ALU can come from L, LD, A, or B, but each of A, B, and L ∪ LD can supply at most one operand. Results from the ALU can go to A, B, S, or SD. There is no direct path from any register in a transfer bank to another register in the same transfer bank. Not shown in the figure is an on-chip scratch memory M, also accessed via L and LD. IXP programming issues To the compiler, the IXP hardware presents a combination of several difficult problems for which there are no good published heuristics. As a result, the state-of-the-art in programming the IXP is still (a very quirky) assembly. A high-level programming lan-guage for the IXP and its compiler must address: Few registers: Because of the penalty for memory accesses, lack of data caches, and real-time constraints, spilling (not to mention the use of a stack) is nearly intolerable. Bank assignment: The IXP has many different register banks with quite different characteristics. Which program variable should be allocated to which bank and when? Register aggregates: Transactions to memory are performed in sets of adjacent registers called aggregates. Several aggregate sizes anywhere in the range 1 . . . 8 occur in most programs. Where should an aggregate be allocated within a bank? If the bank is fragmented, which variables should be moved out to accommodate the aggregate, where should the evicted values go, and when should this happen? Limited data paths: After a variable has been moved to S or SD, it cannot be moved back to another bank without going through memory. If its value is required elsewhere in the program, then it should have been duplicated before being moved. When and how should such duplication occur? Data structures and alignment: Access to SDRAM memory is restricted to 8-byte boundaries and access to SRAM to 4-byte boundaries. Real-world packet data does not respect these alignment requirements. How can one effectively deal with misalignment in conjunction with header field extraction? Fine control: How should one provide the necessary knobs to access specialized hardware registers for I/O and concurrency control in a high level language? Our Approach and Contributions The Nova language and its compiler address all of these issues with good results: Optimal bank assignment: Our formulation of integer-linear programming (ILP) generates an optimal bank assignment including spill considerations. Allocation of aggregates: Allocation of aggregates strongly interacts with bank assignment and is difficult to solve heuristically. Therefore, we use ILP to solve the two problems together. Static single use: Our compiler makes use of a static single use property enforced for certain variables, enabling the register allocator to place these variables into multiple registers at the same time when doing so is beneficial.
doi:10.1145/781131.781135 dblp:conf/pldi/GeorgeB03 fatcat:hnhj2lmi3ndjlc5uaiazebgu3m