Exploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors

J. Sahuquillo, S. Petit, A. Pont, V. Milutinović
2005 Journal of systems architecture  
Current technology continues providing smaller and faster transistors, so processor architects can offer more complex and functional ILP processors, because manufacturers can fit more transistors on the same chip area. As a consequence, the fraction of chip area reachable in a single clock cycle is dropping, and at the same time the number of transistors on the chip is increasing. However, problems related with power consumption and heat dissipation are worrying. This scenario is forcing
more » ... or designers to look for new processor organizations that can provide the same or more performance but using smaller sizes. This fact especially affects the on-chip cache memory design; therefore, studies proposing new smaller cache organizations while maintaining, or even increasing, the hit ratio are welcome. In this sense, the cache schemes that propose a better exploitation of data locality (bypassing schemes, prefetching techniques, victim caches, etc.) are a good example. This paper presents a data cache scheme called filter cache that splits the first level data cache into two independent organizations, and its performance is compared with two other proposals appearing in the open literature, as well as larger classical caches. To check the performance two different scenarios are considered: a superscalar processor and a symmetric multiprocessor. The obtained results show that (i) in the superscalar processor the split data caches perform similarly or better than larger conventional caches, (ii) some splitting schemes work well in multiprocessors while others work less well because of data localities, (iii) the reuse information that some split schemes incorporate for managing is also useful for designing new competitive protocols to boost performance in multiprocessors, (iv) the filter data cache achieves the best performance in both scenarios.
doi:10.1016/j.sysarc.2004.12.002 fatcat:4jdsqp6ipzc6tdshyk7y6cxg74