Workload-Aware Data Management in Shared-Nothing Distributed OLTP Databases

Joarder Mohammad Mustafa Kamal
2017
More than 3.26 billion Internet users are using Web and mobile applications every day for retail services and businesses processing; information management and retrievals; media and social interactions; gaming and entertainments, etc. With the rapid growth in simultaneous users and data volume, such applications often rely on distributed databases to handle large magnitudes of On-Line Transaction Processing (OLTP) requests at the Internet scale. These databases employ user-defined partitioning
more » ... chemes during their initial application design phase to scale-out concurrent writing of user data into a number of shared-nothing servers. However, processing Distributed Transactions (DTs), that involve data tuples from multiple servers, can severely degrade the performance of these databases. In addition, transactional patterns are often influenced by the frequent data tuples in the database, resulting in significant changes in the workload that none of the static partitioning schemes can handle in real-time. This thesis addresses the abovementioned challenge through workload-aware incremental data repartitioning. The primary goal is to minimally redistribute data tuples within the system with a view to minimising the adverse impact of DT without adversely affecting the data distribution load balance. Furthermore, to support both range and consistent-hash based data partitions used in OLTP systems, a distributed yet scalable data lookup process, inspired by the roaming protocol in mobile networks, is introduced. Moreover, additional challenge lies on how to reduce the computational complexities while processing large volume of transactional workload very quickly to make real-time decisions. These complexities are addressed in three distinct ways — hide the workload network complexities during analysis by using different level of abstractions; find the appropriate size of observation window to decide how far to look back in the past for workload analysis; and finally, incrementally (i.e., how frequently) trigger the repar [...]
doi:10.4225/03/58b7a71ee7463 fatcat:rhqsy74p35ffbhzgjmduuh73eq