Piecing Together Large Puzzles, Efficiently: Towards Scalable Loading Into Graph Database Systems

Gabriel Campero Durand, Jingy Ma, Marcus Pinnecke, Gunter Saake
2018 Workshop Grundlagen von Datenbanken  
Many applications rely on network analysis to extract business intelligence from large datasets, requiring specialized graph tools such as processing frameworks (e.g. Apache Giraph, Gradoop), database systems (e.g. Neo4j, JanusGraph) or applications/libraries (e.g. NetworkX, nvGraph). A recent survey reports scalability, particularly for loading, as the foremost practical challenge faced by users. In this paper we consider the design space of tools for efficient and scalable graph bulk loading.
more » ... For this we implement a prototypical loader for a property graph DBMS, using a distributed message bus. With our implementation we evaluate the impact and limits of basic optimizations. Our results confirm the expectation that bulk loading can be best supported as a server-side process. We also find, for our specific case, gains from batching writes (up to 64x speedups in our evaluation), uniform behavior across partitioning strategies, and the need for careful tuning to find the optimal configuration of batching and partitioning. In future work we aim to study loading into alternative physical storages with GeckoDB, an HTAP database system developed in our group.
dblp:conf/gvd/DurandMPS18 fatcat:joffwprfozf4bdpiqbwzuh2hvi