Reaching fast code faster

Won So, Alexander G. Dean
2006 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems - CASES '06  
When integrating software threads together to boost performance on a processor with instruction-level parallel processing support, it is rarely clear which code regions should be aligned and integrated, and which regions should be left alone. This problem grows even worse on a modern VLIW DSP due to complicating factors in both the hardware and compiler: software pipelining, predication, branch delay slots, load delay slots and limited resources. As a result, finding an effective integration
more » ... ategy requires extensive iteration through the integrate/compile/analyze sequence. In this paper we introduce methods to quantitatively estimate the performance benefit from the integration of multiple software threads. We use resource modeling, consider register pressure and compensate for compiler optimizations. This enables different scenarios to be compared and ranked. We then use these estimates to guide integration by concentrating on the most beneficial scenario. Information from each iteration of compilation is used to update the rankings of scenarios. We find that our modeling methods combined with limited compilation quickly find the best integration scenario without requiring exhaustive integration.
doi:10.1145/1176760.1176764 dblp:conf/cases/SoD06 fatcat:j7xkib5u25ehvjdwx2jwme5uty