Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip

Paolo Mantovani, Emilio G. Cota, Christian Pilato, Giuseppe Di Guglielmo, Luca P. Carloni
2016 Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems - CASES '16  
Local memory is a key factor for the performance of accelerators in SoCs. Despite technology scaling, the gap between on-chip storage and memory footprint of embedded applications keeps widening. We present a solution to preserve the speedup of accelerators when scaling from small to large data sets. Combining specialized DMA and address translation with a software layer in Linux, our design is transparent to user applications and broadly applicable to any class of SoCs hosting high-throughput
more » ... ng high-throughput accelerators. We demonstrate the robustness of our design across many heterogeneous workload scenarios and memory allocation policies with FPGA-based SoC prototypes featuring twelve concurrent accelerators accessing up to 768MB out of 1GB-addressable DRAM.
doi:10.1145/2968455.2968509 dblp:conf/cases/MantovaniCPGC16 fatcat:ndxr2qufubanpjvtd2i3xjkqmq