A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Cluster computing frameworks such as Apache Hadoop and Apache Spark are commonly used to analyze large data sets. The analysis often involves running multiple, similar queries on the same data sets. This data reuse should improve query performance, but we find that these frameworks schedule query tasks independently of each other and are thus unable to exploit the data sharing across these tasks. We present Quartet, a system that leverages information on cached data to schedule together tasksdblp:conf/hotstorage/DeslauriersMAGB16 fatcat:dkgdba5m5vhkvdm3oj3ozjbxei