Translating imperative code to MapReduce

Cosmin Radoi, Stephen J. Fink, Rodric Rabbah, Manu Sridharan
2014 Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications - OOPSLA '14  
We present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework. Automating such a translation is challenging: imperative updates must be translated into a functional MapReduce form in a manner that both preserves semantics and enables parallelism. Our approach works by first translating the input code into a functional representation, with loops succinctly represented by fold operations. Then, guided by rewrite rules, our system searches a
more » ... ce of equivalent programs for an effective MapReduce implementation. The rules include a novel technique for handling irregular loop-carried dependencies using group-by operations to enable greater parallelism. We have implemented our technique in a tool called MOLD. It translates sequential Java code into code targeting the Apache Spark runtime. We evaluated MOLD on several real-world kernels and found that in most cases MOLD generated the desired MapReduce program, even for codes with complex indirect updates. 1 Map < String , Integer > wordCount ( List < String > docs ) { 2 Map < String , Integer > m = new HashMap < >(); 3 for ( int i = 0; i < docs . size (); i ++) { 4 // simplified word split for clarity 5 String [] split = docs . get ( i ). split ( " " ); 6 for ( int j = 0; j < split . length ; j ++) { 7 String w = split [ j ]; 8 Integer prev = m . get ( w ); 9 if ( prev == null ) prev = 0; 10 m . put (w , prev + 1); 11 } 12 } 13 return m ; 14 }
doi:10.1145/2660193.2660228 dblp:conf/oopsla/RadoiFRS14 fatcat:p6j3o4ntkrhcpgbabwmbm6eqhe