Desiderata for a Big Data Language

David Maier
2015 Conference on Innovative Data Systems Research  
Data management and analytics systems for big data have proliferated, including column stores, array databases, graphanalysis environments and linear-algebra packages. This burgeoning of systems has lead to a surfeit of language and APIs. It is time to consider a new framework that can span these systems and simplify the programming and maintenance of Big Data applications. There are two key goals for such a framework: Portability: It should be relatively easy to move an application or tool
more » ... loped on one platform to operate against another. As a corollary, back-end data and analytics services should be swappable in a particular platform. Multi-Server Applications: It will be more common than not that a given application will need the services of multiple systems. The framework should make is easy to construct and deploy such applications. Such an organizing framework needs a central abstraction to facilitate communication between front-end clients and back end services. LINQ (Language Integrated Query) provides an example of such a framework, albeit for a narrower class of structures and operations. In LINQ, the central abstraction is the Standard Query Operator (SQO) API, which defines a collection of functions on ordered collections such as Select(), Join() and Reverse(). LINQ clients deliver queries as expressions over these operators, and servers (called LINQ Providers) accept SQO as expressions as input. There are a wide range of Providers, spanning diverse data types, such as SQLServer, LDAP, XML and RDF. LINQ has many beneficial properties, including:
dblp:conf/cidr/Maier15 fatcat:m7oa2hvxt5ddvlb3uufgutudx4