Lucene Based Block Indexing Technology on Large Email Data

Chunyao Song, Yao Ge, Peng Nie, Xiaojie Yuan
As a warehouse for storing and managing data, a relational database supports the index mechanism, to meet users' needs of managing data resources. However, when the amount of data is too large or the users' queries are complicated, its simple index structure is not able to return an accurate query result within a short time. Thus, we need to establish a highly efficient index scheme for large amounts of data. Given that the users' primary requirement is searching keywords on a specified batch
more » ... a specified batch interval on large email data, where each email is associated with a batch attribute, this work builds an email retrieval system by using a full-text searching toolkit called Lucene. This work presents a scheme to build the index according to each email's batch attribute and achieves the coexistence of the block index and the integrated index. The evaluation shows that our scheme has significantly improved the searching efficiency of the email retrieval system compared to the basic system which does not allow a hybrid index structure.