A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
With more than 1200 contributors, Apache Spark is one of the most actively developed open source projects. At this scale and pace of development, mistakes are bound to happen. In this paper we present SparkFuzz, a toolkit we developed at Databricks for uncovering correctness errors in the Spark SQL engine. To guard the system against correctness errors, SparkFuzz takes a fuzzing approach to testing by generating random data and queries. Spark-Fuzz executes the generated queries on a referencedoi:10.1145/3395032.3395327 dblp:conf/sigmod/GhitPRXB20 fatcat:45srxyrnxjartd5ata2rlk4qzm