Simulation-Based Performance Prediction for Large Parallel Machines
International journal of parallel programming
We present a performance prediction environment for large scale computers such as the Blue Gene machine. It consists of a parallel simulator, BigSim, for predicting performance of machines with a very large number of processors, and BigNetSim, which incorporates a pluggable module of a detailed contention-based network model. The simulators provide the ability to make performance predictions for very large machines such as Blue Gene/L. We illustrate the utility of our simulators using
... and prediction studies of several applications using smaller numbers of processors for simulations. Deciding the characteristics of an ideal programming environment for a massively parallel machine like IBM Blue Gene is a challenging task. This is because dealing with tens of thousands or even millions of processors requires a qualitative change in both the programming environment and the runtime system. Further, it is very challenging to evaluate these programming models in a real context before such machines are built. To this end, we have developed a software emulator to mimic a class of target parallel machines, on which a multitier programming model is built. The lowest layer, a low level programming API enabled by the emulator provides a general message passing interface for implementing a high level parallel language which forms a middle layer in our programming environment. The higher level components in the programming environment consist of domain specific languages and libraries. sor, switch (including ports and virtual channels), channels and network/processor interface. Other non-tangible network entities like protocol stack, flow control, routing, arbitration, topology, etc. are modeled in event methods across one or more posers. There is a network configuration file which includes parameters for bandwidth, latency, ports, virtual channels, routing scheme, buffer size, packet size and option to print link contention statistics. The network simulator framework is flexible to model arbitrary topology, routing algorithm, Input and Output Virtual Channels (VC) Selection policies etc. For this paper, we chose a network design close to the actual Blue Gene/L network . The processor interface consists of network injection and reception FIFOs for transferring messages. There are sender and receiver units in each node which send and receive messages to and from the network. There are internal channels connecting the receivers and senders in the same node, and external channels connecting neighboring senders and receivers. Messages are split into packets of up to 256 bytes and injected into the network. The receiver then sends out an arbitration request to the sender units seeking to transfer data. Each receiver has four VCs as shown in Figure 4 . Escape VC helps in preventing deadlocks. Bypass channel can be used to flow through a node without any buffering. Each buffer has 1KB of memory, enough to hold four full sized packets. Escape VC can be used only when dynamic VCs are unavailable. Escape VC can be used only if we can guarantee that space required for a full sized packet i.e 256B is available, even after reserving space for the current packet.