Partitioned multi-indexing

Bahman Bahmani, Ashish Goel
2012 Proceedings of the 21st international conference on World Wide Web - WWW '12  
To answer search queries on a social network rich with usergenerated content, it is desirable to give a higher ranking to content that is closer to the individual issuing the query. Queries occur at nodes in the network, documents are also created by nodes in the same network, and the goal is to find the document that matches the query and is closest in network distance to the node issuing the query. In this paper, we present the "Partitioned Multi-Indexing" scheme, which provides an
more » ... solution to this problem. With m links in the network, after an offlineÕ(m) pre-processing time, our scheme allows for social index operations (i.e., social search queries, as well as insertion and deletion of words into and from a document at any node), all in timeÕ(1). Further, our scheme can be implemented on open source distributed streaming systems such as Yahoo! S4 or Twitter's Storm so that every social index operation takesÕ(1) processing time and network queries in the worst case, and just two network queries in the common case where the reverse index corresponding to the query keyword is much smaller than the memory available at any distributed compute node. Building on Das Sarma et al.'s approximate distance oracle, the worst-case approximation ratio of our scheme is O(1) for undirected networks. Our simulations on the social network Twitter as well as synthetic networks show that in practice, the approximation ratio is actually close to 1 for both directed and undirected networks. We believe that this work is the first demonstration of the feasibility of social search with real-time text updates at large scales.
doi:10.1145/2187836.2187891 dblp:conf/www/BahmaniG12 fatcat:divs47skjvbtzmyczn7pammcmu