Cache-conscious graph collaborative filtering on multi-socket multicore systems

Lifeng Nai, Yinglong Xia, Ching-Yung Lin, Bo Hong, Hsien-Hsin S. Lee
2014 Proceedings of the 11th ACM Conference on Computing Frontiers - CF '14  
Recommendation systems using graph collaborative filtering often require responses in real time and high throughput. Therefore, besides recommendation accuracy, it is critical to study high performance concurrent collaborative filtering on modern platforms. To achieve high performance, we study the graph data locality characteristics of collaborative filtering. Our experiments demonstrate that although an individual graph traversal exhibits poor data locality, multiple queries have a tendency
more » ... sharing their data footprints, especially in the case of queries with neighboring root vertices. Such characteristics lead to both inter-and intra-thread data locality, which can be utilized to significantly improve collaborative filtering performance. Based on these observations, we present a cache-conscious system for collaborative filtering on modern multi-socket multicore platforms. In this system, we propose a cache-conscious query scheduling technique and an in-memory graph representation, and to maximize cache performance and minimize cross-core/socket communication overhead, we address both inter-and intra-thread data locality. To address the workload balancing issue, this study introduces a dynamic work-stealing mechanism to explore the tradeoff between workload balancing and cache-consciousness. The proposed system was evaluated on a Power7+ system against the IBM Knowledge Repository graph dataset. The results demonstrated both good scalability and throughput. Compared with the basic system that does not perform cache-conscious scheduling, inter-thread scheduling improves throughput by up to 18%. Intrathread scheduling can further improve throughput by as much as 22%. By enabling dynamic work-stealing, the proposed technique balances workloads across all threads with a low standard deviation of the per-thread processing time.
doi:10.1145/2597917.2597935 dblp:conf/cf/NaiXLHL14 fatcat:mxhc2q2y5jam3kxossw6bxhikq