Workload-aware table splitting for NoSQL

Francisco Cruz, Francisco Maia, Rui Oliveira, Ricardo Vilaça
2014 Proceedings of the 29th Annual ACM Symposium on Applied Computing - SAC '14  
Massive scale data stores, which exhibit highly desirable scalability and availability properties are becoming pivotal systems in nowadays infrastructures. Scalability achieved by these data stores is anchored on data independence; there is no clear relationship between data, and atomic inter-node operations are not a concern. Such assumption over data allows aggressive data partitioning. In particular, data tables are horizontally partitioned and spread across nodes for load balancing.
more » ... balancing. However, in current versions of these data stores, partitioning is either a manual process or automated but simply based on table size. We argue that size based partitioning does not lead to acceptable load balancing as it ignores data access patterns, namely data hotspots. Moreover, manual data partitioning is cumbersome and typically infeasible in large scale scenarios. In this paper we propose an automated table splitting mechanism that takes into account the system workload. We evaluate such mechanism showing that it simple, non-intrusive and effective.
doi:10.1145/2554850.2555027 dblp:conf/sac/CruzMOV14 fatcat:7ao2bu5kz5ez5nggilrxiqku2q