A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level, requiring many implementation steps even for simple analysis tasks. This has led to the development of advanced dataflow oriented platforms, most prominently Apache Spark and Apache Flink. Those platforms not only aim to improve performance through improvedarXiv:1803.10836v1 fatcat:o4gpa6gvsbdepjg2qez2aqzaiq