Computation Reuse in Analytics Job Service at Microsoft

Alekh Jindal, Sriram Rao, Shi Qiao, Hiren Patel, Zhicheng Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos
2018 Proceedings of the 2018 International Conference on Management of Data - SIGMOD '18  
Analytics-as-a-service, or analytics job service, is emerging as a new paradigm for data analytics, be it in a cloud environment or within enterprises. In this setting, users are not required to manage or tune their hardware and software infrastructure, and they pay only for the processing resources consumed per job. However, the shared nature of these job services across several users and teams leads to significant overlaps in partial computations, i.e., parts of the processing are duplicated
more » ... cross multiple jobs, thus generating redundant costs. In this paper, we describe a computation reuse framework, coined CLOUDVIEWS, which we built to address the computation overlap problem in Microsoft's SCOPE job service. We present a detailed analysis from our production workloads to motivate the computation overlap problem and the possible gains from computation reuse. The key aspects of our system are the following: (i) we reuse computations by creating materialized views over recurring workloads, i.e., periodically executing jobs that have the same script templates but process new data each time, (ii) we select the views to materialize using a feedback loop that reconciles the compile-time and run-time statistics and gathers precise measures of the utility and cost of each overlapping computation, and (iii) we create materialized views in an online fashion, without requiring an offline phase to materialize the overlapping computations.
doi:10.1145/3183713.3190656 dblp:conf/sigmod/JindalQPYDBFLKR18 fatcat:46raoj4saba7vdj7kjneo6vf3m