A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Technical Report: On the Usability of Hadoop MapReduce, Apache Spark & Apache Flink for Data Science
[article]
2018
arXiv
pre-print
Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level, requiring many implementation steps even for simple analysis tasks. This has led to the development of advanced dataflow oriented platforms, most prominently Apache Spark and Apache Flink. Those platforms not only aim to improve performance through improved
arXiv:1803.10836v1
fatcat:o4gpa6gvsbdepjg2qez2aqzaiq