Efficient and high-performance data orchestration for large scale cloud workloads

Shouwei Chen
<span title="">2021</span>
The computing frameworks running in the cloud environment at an extreme scale provide efficient and high-performance computing services to various domains. These cloud computing frameworks build scalable, reliable, and highly accessible data pipelines for many academia, science, and industry services. Data analytics generates a large amount of intermediate data at the back of cloud computing frameworks while processing large amounts of data from different data sources. However, enormous data
more &raquo; ... resses the challenges to these frameworks to deal with data high performance and efficiency. The data orchestration based on memory and high-performance storage devices has become a key concern to optimize these cloud computing frameworks' performance. The increasing data scale and complexity of the cloud environment pose challenges to run applications fast and efficiently. The existing computing clusters can fetch the data from different cloud infrastructure, including common storage, high-performance storage devices, and high-speed fabric interconnection. However, it is still challenging to provide the corresponding data orchestration for the existing computing frameworks. First, computing frameworks access the underlying persistent data storage layer based on the different storage devices and memory. Furthermore, the revolution of storage devices addresses new challenges for existing computing frameworks to utilize advanced storage devices efficiently. Second, most of the existing computing frameworks use an intermediate data layer for intermediate storage. However, providing an efficient and high-performant storage layer for large-scale computing frameworks, such as intermediate data storage and shuffle data storage, is still challenging. The imbalance and small data storage introduce new challenges, including new hardware devices and appropriate data orchestration designs. Consequently, the revolution of hardware devices requires a new paradigm for data orchestration for cloud computing frameworks. This thesis address [...]
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.7282/t3-ckq3-tw41">doi:10.7282/t3-ckq3-tw41</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/3m3fpfhbnnat5ar7ri5y5l3uvm">fatcat:3m3fpfhbnnat5ar7ri5y5l3uvm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210622053713/https://rucore.libraries.rutgers.edu/rutgers-lib/65658/PDF/1/" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/56/c6/56c6c864f6c6839f5c2fc934ac272425e6d5be66.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.7282/t3-ckq3-tw41"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>