Visualizing Network Traffic to Understand the Performance of Massively Parallel Simulations
IEEE Transactions on Visualization and Computer Graphics
The performance of massively parallel applications is often heavily impacted by the cost of communication among compute nodes. However, determining how to best use the network is a formidable task, made challenging by the ever increasing size and complexity of modern supercomputers. This paper applies visualization techniques to aid parallel application developers in understanding the network activity by enabling a detailed exploration of the flow of packets through the hardware interconnect.
... are interconnect. In order to visualize this large and complex data, we employ two linked views of the hardware network. The first is a 2D view, that represents the network structure as one of several simplified planar projections. This view is designed to allow a user to easily identify trends and patterns in the network traffic. The second is a 3D view that augments the 2D view by preserving the physical network topology and providing a context that is familiar to the application developers. Using the massively parallel multi-physics code pF3D as a case study, we demonstrate that our tool provides valuable insight that we use to explain and optimize pF3D's performance on an IBM Blue Gene/P system. Index Terms-Performance analysis, network traffic visualization, projected graph layouts. the most dominant model for high-performance computing communication. However, our techniques also apply to other programming models such as Charm++  . In an MPI program, communication between processes is expressed through routines that send data between pairs or sets of processes. Given this frame of reference it is natural to analyze the message behavior as a graph of communication where nodes are processes and edges represent data exchanges. There exist a number of corresponding tools which visualize MPI communication behavior in this fashion [8, 9, 17, 23] . However, this type of analysis disregards the routing of messages on the physical network hardware. Different MPI implementations, application domain decompositions, or system configurations may realize communication primitives such as global reductions or all-to-all messaging differently. Furthermore, the mapping of individual processes to cores strongly influences the network traffic because the length of message paths may vary, and the paths may interleave or interfere in complex ways. Moreover, systems like the IBM Blue Gene/P may dynamically alter the paths taken by messages. Different physical routes may even be used for different parts of a message. To fully understand these effects on the network traffic and diagnose performance bottlenecks one must instead analyze the physical packets sent on the network. Packets are units of network traffic used for routing within the hardware interconnect. Here we propose a visualization framework to illustrate and analyze the network traffic of packets on some of the largest simulations performed on modern supercomputers. In particular, we show how different process-to-core assignments affect the network traffic, and, by using carefully designed visualizations, we gain insights for optimizing the performance. Our contributions in this work focus around the design and application of a visualization tool that explores the behavior of the network traffic. We make use of a context that is both familiar to application developers and representative of the underlying hardware interconnect. Few tools exist that directly visualize performance data using a similar context [4, 13, 14, 32] , and none of these provide the flexibility or insight of our system. In particular, our approach in visualizing network traffic is to use projections of the network topology. These projections are two-dimensional and retain the intrinsic characteristics of the hardware network while illuminating communication patterns without visual clutter. We consider three-dimensional torus networks, common to many HPC systems such as the IBM Blue Gene series. Consequently, we augment this 2D view with an interactive, linked 3D view that provides a familiar context to application developers on these platforms. Together, these linked views assist application developers in two ways. First, they allow developers to understand trends in application communication from two illustrative viewpoints. Second, application developers are able to understand the connection between the mapping of MPI processes onto nodes and the resulting network traffic in runs on massive supercomputers. We conduct case studies using our approach to evaluate the performance of pF3D, a multi-physics code used to simulate laser-plasma interaction [5, 35] . These studies highlight the strengths of the visualization and the insights it provides to the performance experts on our team.