On the benefits of resource disaggregation for virtual data centre provisioning in optical data centres

Albert Pagès, Rubén Serrano, Jordi Perelló, Salvatore Spadaro
2017 Computer Communications  
Virtual Data Centre (VDC) allocation requires the provisioning of both computing and network resources. Their joint provisioning allows for an optimal utilization of the physical Data Centre (DC) infrastructure resources. However, traditional DCs can suffer from computing resource underutilization due to the rigid capacity configurations of the server units, resulting in high computing resource fragmentation across the DC servers. To overcome these limitations, the disaggregated DC paradigm has
more » ... been recently introduced. Thanks to resource disaggregation, it is possible to allocate the exact amount of resources needed to provision a VDC instance. In this paper, we focus on the static planning of a shared optically interconnected disaggregated DC infrastructure to support a known set of VDC instances to be deployed on top. To this end, we provide optimal and suboptimal techniques to determine the necessary capacity (both in terms of computing and network resources) required to support the expected set of VDC demands. Next, we quantitatively evaluate the benefits yielded by the disaggregated DC paradigm in front of traditional DC architectures, considering various VDC profiles and Data Centre Network (DCN) topologies. However, the constant growth of the Internet traffic and cloud services fostered by bandwidthhungry applications/paradigms such as Big Data, Internet of Things (IoT) and Video on Demand (VoD), pleads for bigger DC infrastructures in terms of both computing and network capacities, in order to accommodate all applications and workflows. For instance, it is forecast that the global IP traffic managed by DCs will almost double by the year 2019, rising from 5.6 ZB to 10.4 ZB per year, with around 75% of the traffic staying inside their premises [2] . This unprecedented traffic growth is pushing the capabilities of current electrical-based DCN fabrics beyond their limits. For this reason, special attention at improving the performance of intra-DCNs is being put in the development of future DC architectures. In this regard, optical technologies have gained considerable interest due to their superior scalability, bandwidth and latency, as well as reduced power consumption. Hence, lots of efforts are being devoted to integrate them in future DCNs [3], either based on hybrid electrical/optical (e.g., as in [4]) or all-optical (e.g., see [5], [6]) network fabrics for the communication of servers inside the DC. Despite such efforts on improving the performance of DCNs, current server-centric DCs still face some limitations toward efficient computing resource utilization. In general, services/tasks in DCs are executed on top of Virtual Machines (VMs) that are deployed at servers. Each VM is provisioned with a set of computing resources (i.e., CPU cores, storage and memory) tailored to the computational needs of the applications. These resources are then allocated and dedicated to VMs during their whole lifecycle. A coexistence of VMs inside the same server is possible if the total amount of resources requested by all of them does not exceed the server's total resource capacity. However, the heterogeneous VM computing resource demands can lead to server underutilization. For instance, it may happen that an application/service (i.e., a VM) running on a server employs almost the totality of one resource type (e.g., CPU cores), while imposing almost no requirements to the others (e.g., storage, memory). As a result, it may be impossible to allocate another application in the same server due to the scarcity of that resource type, letting the remainder underutilized. As an example of this phenomenon, Google has recently published data regarding the utilization of their DC infrastructures, disclosing high disparity of storage/memory to CPU usage for their tasks [7] . Furthermore, it becomes even more difficult to dynamically configure the DC resources under an unpredictable traffic profile. Aside from poor resource utilization, server-centric architectures also suffer from a limited modularity that impacts on the system-wide performance. Traditional servers are usually built by tightly integrating their components (CPU, memory modules, disk, network interface card, etc...) into a single motherboard. This has been the basis of computer manufacturing for many years. However, this tight integration is responsible for the limited improvement possibilities of the overall system performance. This mainly happens because the rate at which the several components scale (in size, speed, etc.) is substantially different. For instance, the rate per year at which CPU performance has increased has been about 60%, while the rate of improvement in DRAM memory performance has merely been around 7% per year. This fact leads to a performance gap between CPU and memory of about 50% per year [8] . Such a disparity on the evolution of the different kinds of server components prevents utilizing the most advanced technology in some cases, since compromise decisions have to be taken in favour of a good system performance.
doi:10.1016/j.comcom.2017.03.009 fatcat:22qe6d5bsvguxp5cg3te75sty4