Workload Modelling and Elasticity Management of Data-Intensive Systems [article]

Ali Reza Khoshkbar Foroushha, University, The Australian National, University, The Australian National
Efficiently and effectively processing large volume of data (often at high velocity) using an optimal mix of data-intensive systems (e.g., batch processing, stream processing, NoSQL) is the key step in the big data value chain. Availability and affordability of these data-intensive systems as cloud managed services (e.g, AmazonElastic MapReduce, Amazon DynamoDB) have enabled data scientists and software engineers to deploy versatile data analytics flow applications, such as click-stream
more » ... and collaborative filtering with less efforts. Although easy to deploy, run-time performance and elasticity management of these complex data analytics flow applications has emerged as a major challenge. As we discuss later in this thesis, the data analytics flow applications combine multiple programming models for per-forming specialized and pre-defined set of activities, such as ingestion, analytics, and storage of data. To support users across such heterogeneous workloads where they are charged for every CPU cycle used and every data byte transferred in or out of the cloud datacenter, we need a set of intelligent performance and workload management techniques and tools. Our research methodology investigates and develops these techniques and tools by significantly extending the well known formal mod-els available from other disciplines of computer science including machine learning, optimization and control theory. To this end, this PhD dissertation makes the following core research contributions: a) investigates a novel workload prediction models (based on machine learn-ing techniques, such as Mixture Density Networks) to forecast how performance parameters of data-intensive systems are affected due to run-time variations in dataflow behaviours (e.g. data volume, data velocity, query mix) b) investigates control-theoretic approach for managing elasticity of data-intensive systems for ensuring the achievement of service level objectives. In the former (a), we propose a novel application of Mixture Density Networks in dis [...]
doi:10.25911/5d514439629c1 fatcat:wqc7ysigqjenhalej3m3drodga