Open issues in parallel query optimization

Waqar Hasan, Daniela Florescu, Patrick Valduriez
1996 SIGMOD record  
We provide an overview of query processing in parallel database systems and discuss several open issues in the optimization of queries for parallel machines. DANIELA FLORESCU INRIA, France Daniela.Florescu @ inria.fr Introduction Parallel database systems combine data management and parallel processing techniques to provide highperformance, high-availability and scalability for dataintensive applications [10, 35] . By exploiting parallel computers, they provide performance at a cheaper price
more » ... n traditional mainframe solutions. Further, they are the solution of choice for high transaction throughput in OLTP systems as well as low response times in decisionsupport systems. Finally, parallel databases are the only viable solution for very large databases. SQL, the standard language for programming database access, is a high-level, set-oriented, declarative language. This permits SQL compilers to automatically infer and exploit parallelism. Users do not have to learn a new language and application code does not need to be rewritten to benefit from parallel execution. This is to be contrasted to the use of lower-level languages in scientific computing which often requires re-writing application code to take advantage of parallel machines. A key to the success of parallel database systems, particularly in decision-support applications, is parallel query optimization. Given a SQL query, parallel query optimization has the goal of finding a parallel plan that delivers the query result in minimal time. While considerable progress has been made, several problems remain open. Further, solutions to the optimization problem are sensitive to the query language expressive power, the underlying execution mechanisms, the machine architecture, and variations in the cost metric being minimized. New applications, demands for higher performance from existing applications, innovations in query execution mechanisms and machine architectures are changing some of the underlying assumptions thereby offering new challenges. Parallel query optimization offers challenges beyond those addressed by past research in fields such as distributed databases [30] or classical scheduling theory [18] . While distributed and parallel databases are fundamentally similar, research in distributed query optimization was done in the early 1980s, a time at which *Current address: Informix Soft'are.
doi:10.1145/234889.234894 fatcat:kbvsxwxmxnhetg5adb47hx6bna