SparkR

Shivaram Venkataraman, Ion Stoica, Matei Zaharia, Zongheng Yang, Davies Liu, Eric Liang, Hossein Falaki, Xiangrui Meng, Reynold Xin, Ali Ghodsi, Michael Franklin
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
R is a popular statistical programming language with a number of extensions that support data processing and machine learning tasks. However, interactive data analysis in R is usually limited as the R runtime is single threaded and can only process data sets that fit in a single machine's memory. We present SparkR, an R package that provides a frontend to Apache Spark and uses Spark's distributed computation engine to enable large scale data analysis from the R shell. We describe the main
more » ... goals of SparkR, discuss how the high-level DataFrame API enables scalable computation and present some of the key details of our implementation.
doi:10.1145/2882903.2903740 dblp:conf/sigmod/VenkataramanYLL16 fatcat:inpgt6bmmne43bgtkzzhjv7vxu