The Internet Archive has a preservation copy of this work in our general collections.
The file type is application/pdf
.
Shark: SQL and Rich Analytics at Scale
[article]
2012
arXiv
pre-print
Shark is a new data analysis system that marries query processing with complex analytics on large clusters. It leverages a novel distributed memory abstraction to provide a unified engine that can run SQL queries and sophisticated analytics functions (e.g., iterative machine learning) at scale, and efficiently recovers from failures mid-query. This allows Shark to run SQL queries up to 100x faster than Apache Hive, and machine learning programs up to 100x faster than Hadoop. Unlike previous
arXiv:1211.6176v1
fatcat:cdpyu3sp3bd7rcdzaaci4juayi