Generating SIMD Instructions for Cerebras CS-1 using Polyhedral Compilation Techniques
The Cerebras CS-1 is a computing system based on a wafer-scale processor having nearly 400,000 compute cores. It is intended for training of and inference on deep neural networks. The architecture has several features specifically designed for this and related fields. One of these is a sophisticated SIMD engine that can mimic a rectangular loop nest of depth at most four. In order to achieve optimal performance, it is crucial to use SIMD instructions as much as possible. This paper describes a
... paper describes a high-level polyhedral compiler that takes a high-level algorithm description that can be written manually or extracted from a TensorFlow computation graph and generates input to the low-level C-based compiler. In this intermediate code, the use of SIMD instructions is made explicit. The main focus of the paper is the generation of these CS-1 SIMD instructions for convolution style algorithms. What complicates the task is that the set of computation instances that need to be performed may not at first sight look like they form a rectangular loop nest. The basis of the compilation is formed by an effective combination of relatively well-known, but more specialized polyhedral operations.