CrowdSJ: Skyline-Join Query Processing of Incomplete Datasets with Crowdsourcing
Skyline query is very useful in decision-making systems, WSN and so on. As a variation of skyline query, skyline-join query can return the results from multiple datasets. However, incomplete datasets are a frequent phenomenon due to the widespread use of automated information extraction and aggregation. Existing methods for dealing with incomplete data, such as probability, data padding can solve the problem, but cannot effectively reflect the real situation and are lack of integrality.
... ntegrality. Therefore, in this paper, in order to reflect the situation more accuracy and more user-centric, we research the problem of skyline-join query over incomplete datasets with crowdsourcing, named CrowdSJ. The crowdsourcingbased skyline-join query processing problem over incomplete datasets is divided into two situations. One is the skyline-join query only involves the unknown crowdsourcing attribute and the join attribute, named Partial Skyline-Join with Crowdsourcing (PSJCrowd). The other one is the skyline-join query involves all the attributes, named All Skyline-Join with Crowdsourcing (ASJCrowd). For PSJCrowd, first, we filter the known dataset. Then, we present the level-preference-tree-index, and propose the partial skyline-join with crowdsourcing algorithm. For ASJCrowd, first, we filter the known dataset too. Second, we build a levelpreference-tree-index based on the known attributes of the incomplete dataset. Third, we propose the skyline-join with crowdsourcing on single dataset algorithm, CrowdSJ-single, to filter the dataset containing unknown attributes. Then, we build a global level-preference-tree-index based on the known attributes of the incomplete dataset and the complete dataset. We propose the skyline-join with crowdsourcing on multiple datasets algorithm, CrowdSJ-multiple. We filter the linked tuples based on the global level-preference-tree-index and the results of each round of crowdsourcing. Numerous experiments on synthetic and real datasets demonstrate that our algorithms are efficient and effective. INDEX TERMS skyline-join query, incomplete data, crowdsourcing, index structure