Time-evolving graph processing at scale

Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, Ion Stoica
2016 Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems - GRADES '16  
Time-evolving graph-structured big data arises naturally in many application domains such as social networks and communication networks. However, existing graph processing systems lack support for efficient computations on dynamic graphs. In this paper, we represent most computations on time evolving graphs into (1) a stream of consistent and resilient graph snapshots, and (2) a small set of operators that manipulate such streams of snapshots. We then introduce G T , a time-evolving graph
more » ... sing framework built on top of Apache Spark, a widely used distributed dataflow system. G T quickly builds faulttolerant graph snapshots as each small batch of new data arrives. G T achieves high performance and fault tolerant graph stream processing via a number of optimizations. G T also unifies data streaming and graph streaming processing. Our preliminary evaluations on two representative datasets show promising results. Besides performance benefit, G T API relieves programmers from handling graph snapshot generation, windowing operators and sophisticated differential computation mechanisms. Introduction Graph-structured data is on the rise, in size, complexity and the dynamism they exhibit. From social networks (e.g., Facebook, Twitter) to telecommunication networks (e.g., cellular networks), applications that generate graph-structured data are ubiquitous. With the increasing interest in the Internet-of-Things (IoT), the trend is likely to continue in the future. Unlike unstructured datasets, the dynamic nature of these datasets give them a unique characteristic-the graph-structure underlying the data evolves over time. Unbounded, real-time data is fast becoming the norm [2], and thus it is important to process these time-evolving graph-structured datasets efficiently. Mining time-evolving graphs can reveal insights that are beneficial for businesses. To extract maximum insights, frameworks for timeevolving graph processing must be able to support a variety of analysis tasks. First, they must be able to execute iterative graph algorithms in real-time. For example, social networks such as Twitter can recommend products based on up-to-date TunkRank (similar to PageRank) of people in an attention-graph [8], and cellular network operators can fix traffic hotspots in their networks as they are detected [12] . Second, analytics tasks typically often involve
doi:10.1145/2960414.2960419 dblp:conf/grades/IyerLDS16 fatcat:tks4gkhimzhtzocriqu3vxwrle