Concept and benchmark results for Big Data energy forecasting based on Apache Spark

Jorge Ángel González Ordiano, Andreas Bartschat, Nicole Ludwig, Eric Braun, Simon Waczowicz, Nicolas Renkamp, Nico Peter, Clemens Düpmeier, Ralf Mikut, Veit Hagenmeyer
2018 Journal of Big Data  
The present article describes a concept for the creation and application of energy forecasting models in a distributed environment. Additionally, a benchmark comparing the time required for the training and application of data-driven forecasting models on a single computer and a computing cluster is presented. This comparison is based on a simulated dataset and both R and Apache Spark are used. Furthermore, the obtained results show certain points in which the utilization of distributed computing based on Spark may be advantageous.
doi:10.1186/s40537-018-0119-6 fatcat:wofo5vx6lbcn7j3j3jht7wn7fy