Experimenting lucene index on HBase in an HPC environment

Xiaoming Gao, Vaibhav Nachankar, Judy Qiu
2011 Proceedings of the first annual workshop on High performance computing meets databases - HPCDB '11  
Data intensive computing has been a major focus of scientific computing communities in the past several years, and many technologies and systems have been developed to efficiently store and serve terabytes or even petabytes of data. One important effort in this direction is the HBase system. Modeled after Google's BigTable, HBase supports reliable storage and efficient access to billions of rows of structured data. However, it does not provide an efficient searching mechanism based on column
more » ... ues. To achieve efficient search on text data, this paper proposes a searching framework based on Lucene full-text indices implemented as HBase tables. Leveraging the distributed architecture of HBase, we expect to get high performance and availability, and excellent scalability and flexibility for our searching system. Our experiments are based on data from a real digital library application and carried out on a dynamically constructed HBase deployment in a high-performance computing (HPC) environment. We have completed system design and data loading tasks of this project, and will cover index building and performance tests in future work.
doi:10.1145/2125636.2125646 fatcat:5e6y6rjttjcedhlvrwd6w4a4g4