Characterization of Shared-Memory Multi-Core Applications
Jordanian Journal of Computers and Information Technology
The multicore processor architectures have been gaining increasing popularity in the recent years. However, many available applications cannot take full advantage of these architectures. Therefore, many researchers have developed several characterization techniques to help programmers understand the behavior of these applications on multicore platforms and to tune them for better efficiency. This paper proposes an on-the-fly, configuration-independent characterization approach for
... the inherent characteristics of multicore applications. This approach is fast, because it does not depend on the details of any specific machine configuration and does not require repeating the characterization for every target configuration. It just keeps track of memory accesses and the cores that perform these accesses through piping memory traces, on-the-fly, to the analysis tool. We applied this approach to characterize eight applications drawn from SPLASH-2 and PARSEC benchmark suites. This paper presents the inherent characteristics of these applications, including memory access instructions, communication characteristics patterns, sharing degree, invalidation degree, communication slack and communication locality. The results show that two of the studied applications have high parallelization overhead, which are Cholesky and Fluidanimate. The results also indicate that the studied applications of SPLASH-2 have higher communication rates than the studied applications of PARSEC and these rates generally increase as the number of used threads increases. Most of the sharing and invalidation occurs in small degrees. However, two of SPLASH-2 applications have significant fraction of communication with high sharing degrees involving four or more threads. Most of the applications have some uniform communication component and the initial thread is generally involved in more communication compared to the other threads. KEYWORDS Multi-core processor, On-the-fly analysis, Shared memory applications, Communication patterns, Performance evaluation. reduce these patterns. In general, the communication rates increase with more threads and PARSEC applications have rates smaller than SPLASH-2 applications. Almost all the sharing in Radix, FFT and Blackscholes is with only one thread. In Fluidanimate and Swaptions, thare are about 23% of sharing with two threads. In LU, Cholesky and Canneal, there are 97, 42 and 24% of sharing with two or more threads, respectively. The invalidation degrees in most of the applications are similar to their sharing degrees. There is considerable diversity in the communication locality of the studied applications. Some applications show uniform communication components such as FFT, Canneal and Blackscholes. Others show nonuniform communication and almost in all applications, the initial thread communicates with the other threads. Therefore, it is advisable to assign the initial thread to a central core to reduce the communication cost. As future work, we plan to extend CIAT to capture the instruction stream in addition to capturing the data stream. Moreover, we need to develop CIAT to handle additional parallelization schemes such as the pipeline parallelization scheme that is used in three PARSEC applications: Dedup, Ferret and X264.