Steno

Derek Gordon Murray, Michael Isard, Yuan Yu
2012 SIGPLAN notices  
Declarative queries enable programmers to write data manipulation code without being aware of the underlying data structure implementation. By increasing the level of abstraction over imperative code, they improve program readability and, crucially, create opportunities for automatic parallelization and optimization. For example, the Language Integrated Query (LINQ) extensions to C# allow the same declarative query to process in-memory collections, and datasets that are distributed across a
more » ... ute cluster. However, our experiments show that the serial performance of declarative code is several times slower than the equivalent hand-optimized code, because it is implemented using run-time abstractions-such as iterators-that incur overhead due to virtual function calls and superfluous instructions. To address this problem, we have developed Steno, which uses a combination of novel and well-known techniques to generate code for declarative queries that is almost as efficient as handoptimized code. Steno translates a declarative LINQ query into type-specialized, inlined and loop-based imperative code. It eliminates chains of iterators from query execution, and optimizes nested queries. We have implemented Steno for uniprocessor, multiprocessor and distributed computing platforms, and show that, for a real-world distributed job, it can almost double the speed of endto-end execution. 1. LINQ queries are lazily evaluated, and use iterators to communicate elements between stages of the query [2]. An iterator imposes the overhead of two virtual function calls per element per query operator. 2. LINQ queries may be nested, which involves each element flowing through multiple iterators. The iterator overhead is therefore multiplied by the number of nesting levels. 3. The lazy iterator implementation includes state machine logic to simulate coroutine behavior [22] , which adds further perelement overhead. 4. An operator's behavior-such as a predicate or transformation function-is specified as a function object, which incurs a further virtual call per element per operator. To address these overheads, we have implemented Steno: an optimizer for LINQ queries that generates the equivalent loopbased imperative code. Steno performs two optimizations: iterator fusion (Section 4), and nested loop generation (Section 5). Similar optimizers have been developed for functional languages [9, 31] and relational database query languages [15, 23] . However, Steno makes several contributions beyond existing work:
doi:10.1145/2345156.1993513 fatcat:lzoqp4hlhnbkjkqw4vft7llw2m