A performance evaluation of Hive for scientific data management

Taoying Liu, Jing Liu, Hong Liu, Wei Li
2013 2013 IEEE International Conference on Big Data  
It is very important to evaluate the MapReducebased frameworks for scientific data processing applications. Scientists need a low-cost, scalable, easy-to-use and faulttolerance platform for large volume data processing eagerly. This paper presents an implementation of a scientific data management benchmark, SSDB, on Hive, a MapReduce-based data warehouse. A complete strategy of migrating SSDB to Hive is described in detail including query HQL implementation, data partition schema and
more » ... hema and adjustments of underlying storage facilities. We have tuned the performance using several system parameters provided by Hive, Hadoop and HDFS. This paper provides preliminary results and analysis. Evaluation results indicate that Hive achieves acceptable performance for some data analysis tasks even compared with some high efficient distributed parallel databases, but it needs subtle adjustments of underlying storage facilities and indexing mechanism.
doi:10.1109/bigdata.2013.6691696 dblp:conf/bigdataconf/LiuLLL13 fatcat:mvgsipp2dbeujh6r3t3caexzca