QMapper for Smart Grid

Yue Wang, Yingzhong Xu, Yue Liu, Jian Chen, Songlin Hu
2015 Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data - SIGMOD '15  
Apache Hive has been widely used by Internet companies for big data analytics applications. It can provide the capability of compiling high-level languages into efficient MapReduce workflows, which frees users from complicated and time consuming programming. The popularity of Hive and its HiveQL-compatible systems like Impala and Shark attracts attentions from traditional enterprises as well. However, enterprise big data processing systems such as Smart Grid applications often have to migrate
more » ... eir RDBMS-based legacy applications to Hive rather than directly writing new logic in HiveQL. Considering their differences in syntax and cost model, manual translation from SQL in RDBMS to HiveQL is very difficult, error-prone, and often leads to poor performance. In this paper, we propose QMapper, a tool for automatically translating SQL into proper HiveQL. QMapper consists of a rule-based rewriter and a cost-based optimizer. The experiments based on the TPC-H benchmark demonstrate that, compared to manually rewritten Hive queries provided by Hive contributors, QMapper dramatically reduces the query latency on average. Our real world Smart Grid application also shows its efficiency.
doi:10.1145/2723372.2742792 dblp:conf/sigmod/WangXLCH15 fatcat:nh5ipmuue5eyzgs5z6rwwrh5ym