Optimizing data aggregation for cluster-based internet services

Lingkun Chu, Hong Tang, Tao Yang, Kai Shen
2003 Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '03  
Large-scale cluster-based Internet services often host partitioned datasets to provide incremental scalability. The aggregation of results produced from multiple partitions is a fundamental building block for the delivery of these services. This paper presents the design and implementation of a programming primitive -Data Aggregation Call (DAC) -to exploit partition parallelism for clusterbased Internet services. A DAC request specifies a local processing operator and a global reduction
more » ... , and it aggregates the local processing results from participating nodes through the global reduction operator. Applications may allow a DAC request to return partial aggregation results as a tradeoff between quality and availability. Our architecture design aims at improving interactive responses with sustained throughput for typical cluster environments where platform heterogeneity and software/hardware failures are common. At the cluster level, our load-adaptive reduction tree construction algorithm balances processing and aggregation load across servers while exploiting partition parallelism. Inside each node, we employ an event-driven thread pool design that prevents slow nodes from adversely affecting system throughput under highly concurrent workload. We further devise a staged timeout scheme that eagerly prunes slow or unresponsive servers from the reduction tree to meet soft deadlines. We have used the DAC primitive to implement several applications: a search engine document retriever, a parallel protein sequence matcher, and an online parallel facial recognizer. Our experimental and simulation results validate the effectiveness of the proposed optimization techniques for (1) reducing response time, (2) improving throughput, and (3) gracefully handling server unresponsiveness. We also demonstrate the (4) ease-of-use of the DAC primitive and (5) the scalability of our architecture design.
doi:10.1145/781498.781517 dblp:conf/ppopp/ChuTYS03 fatcat:ykr5yvvkg5hyhoh6cu4eqewstu