Efficient Influence Related Queries [thesis]

Jianye Yang
2017
Recently, there is a surge of interest on mining valuable information from the given datasets. As one of the most important information mining tasks, influence analysis has drawn tremendous attention in both industry and academic communities. Due to the large scale of dataset, there is an emerging call for efficient processing influence related queries. In this thesis, we study three important influence related problems regarding three types of data, i.e., product and user preference data,
more » ... o-textual objects, and set-valued data. Firstly, for product and user preference data, we formulate the problem of influence-based cost optimization on user preference functions, which is critical to unlock the great scientific and social-economic value of these data. By utilizing the classical k-level computation techniques, we show the solution space of our problem can be reduced to a finite number of possible positions (points). To efficient process this problem, we propose a traverse-based 2-dimensional algorithm with linear time complexity. For general multi-dimensional spaces, we develop a space partition based exact algorithm. To accelerate the computation, we further devise a randomized sampling method with high accuracy. Secondly, for spatio-textual objects, we present a novel definition of object influence in applications where objects are of different categories. Under this definition, we investigate the problem of finding the top-k most influential objects. We first show that this problem is NP-hard with respect to the number of object categories. To tackle the computational hardness, we then develop efficient nearest neighbor set based exact as well as approximate algorithms. In particular, our polynomial approximate algorithm has a 2-factor performance guarantee. Finally, for set-valued data, we investigate the problem of set containment join which is an essential and fundamental tool for set-valued data analysis. Based on the computing paradigms, we classify the existing algorithms into two categories, namel [...]
doi:10.26190/unsworks/19847 fatcat:z4kfyj5dxfdudfccfwvgc6n66m