Highly Parallel Multi-FPGA System Compilation from Sequential C/C++ Code in the AWS Cloud
ACM Transactions on Reconfigurable Technology and Systems
We present a High Level Synthesis compiler that automatically obtains a multi-chip accelerator system from a single-threaded sequential C/C++ application. Invoking the multi-chip accelerator is functionally identical to invoking the single-threaded sequential code the multi-chip accelerator is compiled from. Therefore, software development for using the multi-chip accelerator hardware is simplified, but the multi-chip accelerator can exhibit extremely high parallelism. We have implemented,
... d and verified our push-button system design model on multiple FPGAs of the Amazon Web Services EC2 F1 instances platform, using, as an example, a sequential-natured DES key search application that does not have any DOALL loops and that tries each candidate key in order and stops as soon as a correct key is found. An 8 FPGA accelerator produced by our compiler achieves 44, 600 times better performance than an x86 Xeon CPU executing the sequential single-threaded C program the accelerator was compiled from. New features of our compiler system include: an ability to parallelize outer loops with loop-carried control dependences, an ability to pipeline an outer loop without fully unrolling its inner loops, and fully automated deployment, execution and termination of multi-FPGA application-specific accelerators in the AWS cloud, without requiring any manual steps.