Approximating distributed graph algorithms
The interest in the ability of processing data that has an underlying graph structure is grown in the recent past. This has led to the development of many distributed graph processing systems. But these existing graph processing systems take several minutes or even hours to execute popular graph algorithms. The amount of data is also growing fast, for example, the world wide web or social graphs. This leads to the question: do we always need to know the exact answer for a large graph? In other
... ields like big data analytics, approximation gained interest in recent time, the user can decide about the accuracy of the result and if the user accepts a less accurate result the calculations could be speed up. Also, for distributed event based systems, such as publish/subscribe, and stream processing systems approximation techniques exists. For distributed graph processing exists only a few approaches that provide approximation techniques. Most of these approaches concentrate on sparsification of the graph or approximation of the vertex function itself. But the bottleneck in distributed graph processing arises mainly from the message passing between vertices. This thesis, investigates message dropping for the Page Rank algorithm. Two ways of message dropping are investigated, individual dropping of messages based on message properties and dropping all messages from selected edges (edge sampling). The dropping aims to reduce the runtime, while minimize the error. Both approaches are tested with different properties. A detailed analysis of the results of both approaches and the different properties is presented. The evaluation is done on three real world graphs. The error metrics used for the evaluation are also described in this thesis.