SQL query log analysis for identifying user interests and query recommendations

Natalia Arzamasova
2020
In the sciences and elsewhere, the use of relational databases has become ubiquitous. To get maximum profit from a database, one should have in-depth knowledge in both SQL and a domain (data structure and meaning that a database contains). To assist inexperienced users in formulating their needs, SQL query recommendation system (SQL QRS) has been proposed. It utilizes the experience of previous users captured by SQL query log as well as the user query history to suggest. When constructing such
more » ... constructing such a system, one should solve related problems: (1) clean the query log and (2) define appropriate query similarity functions. These two tasks are not only necessary for building SQL QRS, but they apply to other problems. In what follows, we describe three scenarios of SQL query log analysis: (1) cleaning an SQL query log, (2) SQL query log clustering when testing SQL query similarity functions and (3) recommending SQL queries. We also explain how these three branches are related to each other. Scenario 1. Cleaning SQL query log as a general pre-processing step The raw query log is often not suitable for query log analysis tasks such as clustering, giving recommendations. That is because it contains antipatterns and robotic data downloads, also known as Sliding Window Search (SWS). An antipattern in software engineering is a special case of a pattern. While a pattern is a standard solution, an antipattern is a pattern with a negative effect. When it comes to SQL query recommendation, leaving such artifacts in the log during analysis results in a wrong suggestion. Firstly, the behaviour of "mortal" users who need a recommendation is different from robots, which perform SWS. Secondly, one does not want to recommend antipatterns, so they need to be excluded from the query pool. Thirdly, the bigger a log is, the slower a recommendation engine operates. Thus, excluding SWS and antipatterns from the input data makes the recommendation better and faster. The effect of SWS and antipatterns on query log clustering depends on the chosen [...]
doi:10.5445/ir/1000126161 fatcat:zz7bywognvdatkgosentg5ffsm