Vector Processing as a Soft Processor Accelerator

Jason Yu, Christopher Eagleston, Christopher Han-Yu Chou, Maxime Perreault, Guy Lemieux
2009 ACM Transactions on Reconfigurable Technology and Systems  
Current FPGA soft processor systems use dedicated hardware modules or accelerators to speed up data-parallel applications. This work explores an alternative approach of using a soft vector processor as a general-purpose accelerator. The approach has the benefits of a purely softwareoriented development model, a fixed ISA allowing parallel software and hardware development, a single accelerator that can accelerate multiple applications, and scalable performance from the same source code. With no
more » ... hardware design experience needed, a software programmer can make area-versus-performance tradeoffs by scaling the number of functional units and register file bandwidth with a single parameter. A soft vector processor can be further customized by a number of secondary parameters to add or remove features for a specific application to optimize resource utilization. This paper introduces VIPERS, a soft vector processor architecture that maps efficiently into an FPGA and provides a scalable amount of performance for a reasonable amount of area. Compared to a Nios II/s processor, instances of VIPERS with 32 processing lanes achieve up to 44× speedup using up to 26× the area. ALU Instruction Memory 32 Scalar Core Memory Crossbars Memory Unit FIFO queue 1 Write port 2 Read ports Vector Lane Main Memory 128 ID ID Vector Lane 1: VIPERS soft vector processor consisting of scalar core and multiple vector lanes Three ways of accelerating FPGA-based applications with plenty of data-level parallelism are: 1) build a multiprocessor system and write parallel code, or 2) build a dedicated hardware accelerator in the FPGA logic, or 3) improve the processor design to exploit more parallelism. The first approach requires worrying about the complexity of parallel system design, debugging and coping with incoherent memory or deadlock. The second approach requires some level of hardware design experience, even with high-level tools like the Nios II C2H Compiler [Altera Corp. 2008b ] which automatically compiles a C function into a hardware accelerator. The third approach is limited because traditional VLIW and superscalar architectures do not scale much beyond 4-way parallelism and do not map efficiently to FPGAs. An ideal approach would combine the advantages of all these methods: (1) have scalable performance and resource usage, (2) be simple to use, ideally requiring no hardware design effort, (3) separate hardware and software design flows early in the design, and (4) enable rapid development by avoiding synthesis, place and route iterations. A soft vector processor, such as the one shown in Figure 1 , addresses all of these requirements. A vector processor is a particularly good choice for applications with abundant data-parallelism. These same applications are also frequently accelerated using customdesigned hardware. This paper introduces VIPERS, Vector ISA Processors for Embedded Reconfigurable Systems, as a solution to deliver scalable performance and resource usage through configurability for data-parallel applications. It provides a simple programming model that can
doi:10.1145/1534916.1534922 fatcat:bwu777f7onbn5j5tptvztpmho4