Enabling technologies for future data center networking: a primer

Min Chen, Hai Jin, Yonggang Wen, V. C. M. Leung
2013 IEEE Network  
8 0890-8044/13/25.00 © 2013 IEEE urrently, an increasing number of science and engineering applications involve big data and intensive computing, which present increasingly high demands for network bandwidth, response speed, and data storage. Due to complicated management and low operation efficiency, it is difficult for existing computing and service modes to meet these demands. As a novel computing and service mode, cloud computing has already become prevalent, and is attracting extensive
more » ... ntion from both academia and industry. Companies find such a mode appealing as it requires smaller investments to deploy new businesses, and extensively reduces operation and maintenance cost with a lower risk to support new services. The core idea of cloud computing is the unification and dispatch of networked resources via a resource pool to provide virtual processing, storage, bandwidth, and so on. In order to achieve this goal, it is critical to evolve the existing network architecture to a cloud network platform that has high performance with large bandwidth capacity and good scalability while maintaining a high level of quality of service (QoS). The cloud network platform where applications obtain services is referred to as a data center network (DCN). 1 A traditional DCN interconnects servers by electronic switching with a limited number of switching ports, and usually employs a multitier interconnection architecture to extend the number of ports to provide full connectivity without blocking, which suffers from poor scalability. However, with the convergence of cloud computing with social media and mobile communications, the types of data traffic are becoming more diverse while the number of clients is increasing exponentially. Thus, the traditional DCN meets two major downfalls in terms of scalability and flexibility: • From the interconnection architecture point of view, it is hard to extend the network capacity of the traditional DCN physically to scale with the fluctuating traffic volumes and satisfy the increasing traffic demand. • From the multi-client QoS support point of view, the traditional design is insensitive to various QoS requirements for a large number of clients. Also, it is challenging to virtualize a private network for each individual client to meet specific QoS requirements while minimizing resource redundancy. The modern data center (DC) can contain as many as 100,000 servers, and the required peak communication bandwidth can reach up to 100 Tb/s [1]. Meanwhile, the supported number of users can be very large. As predicted in [2], the number of mobile cloud computing subscribers worldwide is expected to grow rapidly over the next five years, rising from 42.8 million subscribers in 2008 (approximately 1.1 percent of all mobile subscribers) to just over 998 million in 2014 (nearly 19 percent). With the requirements to provide huge bandwidth capacity and support a very large number of clients, how to design a future DCN is a hot topic, and the following challenging issues should be addressed. Scalable bandwidth capacity with low cost: In order to satisfy the requirement of non-blocking bisection bandwidth among servers, huge bandwidth capacity should be provided by an efficient interconnection architecture, while the cost and complexity should be decreased as much as possible. Energy efficiency: DCs consume a huge amount of power and account for about 2 percent of the greenhouse gas emissions that are exacerbating global warming. Typically, the annual energy use of a DC (2 MW) is equal to the amount of C C Abstract The increasing adoption of cloud services is demanding the deployment of more data centers. Data centers typically house a huge amount of storage and computing resources, in turn dictating better networking technologies to connect the large number of computing and storage nodes. Data center networking (DCN) is an emerging field to study networking challenges in data centers. In this article, we present a survey on enabling DCN technologies for future cloud infrastructures through which the huge amount of resources in data centers can be efficiently managed. Specifically, we start with a detailed investigation of the architecture, technologies, and design principles for future DCN. Following that, we highlight some of the design challenges and open issues that should be addressed for future DCN to improve its energy efficiency and increase its throughput while lowering its cost. 9 energy consumed by around 5000 U.S. cars in the same period. The power consumption of DCs mainly comes from several aspects, including switching/transmitting data traffic, storage/computation within numerous servers, cooling systems, and power distribution loss. Energy-aware optimization policies are critical for green DCN. User-oriented QoS provisioning: A large-scale DC carries various kinds of requests with different importance or priority levels from many individual users. The QoS provisioning should be differentiated among different users. Even for the same user, the QoS requirements can change dynamically over time. Survivability/reliability: In case of system failures, uninterrupted communications should be guaranteed to offer almost uninterrupted services. Thus, it is very crucial to design finely tuned redundancy to achieve the desired reliability and stability with the lowest resource waste. In this article, the technologies for building a future DCN are mainly classified into three categories: DCN architecture, inter-DCN communications, and large-scale clients supporting technologies. The organization of the rest of this article addresses the above three categories of technologies. We present various DCN interconnection architectures. The emerging communication techniques to connect multiple DCNs are then described. The design issues to support QoS requirements for large-scale clients are given. We provide a detailed description of a novel testbed, Cloud3DView, for modular DC, as well as outline some future research issues and trends. Finally, we give our concluding remarks. A large DCN may comprise hundreds of thousands or even more servers. These servers are typically connected through a two-level hierarchical architecture (i.e., a fat-tree topology). In the first level, the servers in the same rack are connected to the top of the rack (ToR) switch. In the second level, ToR switches are interconnected through higher-layer switches. The key to meeting the requirements of huge bandwidth capacity and high-speed communications for DCN is to design an efficient interconnecting architecture. In this section, we first classify the networking architecture inside a DC into four categories -electronic switching, wireless, all-optical switching, and hybrid electronic/optical switching -which are detailed below. Electronic Switching Technologies Although researchers have conducted extensive investigations on various structures for DCNs recently, most of the designs are based on electronic switching [3, 4] . In electronic-switching-based DCN, the number of switching ports supported by an electronic switch is limited. In order to provide a sufficient number of ports to satisfy the requirement of non-blocking communications among a huge number of servers, the serveroriented multi-tier interconnection architecture is usually employed. Due to the hierarchical structure, oversubscription and unbalanced traffic are the intrinsic problems in electronicswitching-based DCN. The limitation of such an architecture is that the number of required network devices is very large, and the corresponding construction cost is expensive; also, the network energy consumption is high. Therefore, the key to tackling the challenge is to provide balanced communication bandwidth between any arbitrary pair of servers. Thus, any server in the DCN is able to communicate with any other server at full network interface card (NIC) bandwidth. In order to ensure that the aggregation/core layer of the DCN is not oversubscribed, more links and switches are added to facilitate multipath routing [3, 4] . However, the increment of the performance is traded off for the increased cost of a larger amount of hardware and greater networking complexity. Wireless Data Center Recently, wireless capacity of 60 GHz spectrum has been utilized to tackle the hotspot problem caused by oversubscription and unbalanced traffic [5] . The 60 GHz transceivers are deployed to connect ToR for providing supplemental routing paths in addition to traditional wired links in DCNs. However, due to the intrinsic line-of-sight limitation of the 60 GHz wireless links, the realization of wireless DC is quite challenging. To alleviate this problem, a novel 3D wireless DC is proposed to solve the link blockage and radio interference problems [6] . In 3D wireless DC, wireless signals bounce off DC ceilings to establish non-blockage wireless connections. In [7], a wireless flyway system is designed to set up the most beneficial flyways and routes over them both directly and indirectly to reduce congestion on hot links. The scheduling problem in wireless DCN is first found and formulated as an important foundation for further work on this area [8] . Then an ideal theoretical model is developed by considering both the wireless interference and the adaptive transmission rate [9] . A novel solution to combining throughput and job completion time is proposed to efficiently improve the global performance of wireless transmissions. As pointed out in [9], channel allocation is a critical research issue for wireless DCNs. All-Optical Switching Due to super-high switching/transmission capacity, the optical fiber transmission system is considered as one of the most appropriate transmission technologies for DCN. Moreover, the super-high switching capacity and flexible multiplexing capability of all-optical switching technology provide the possibility of flattening the DCN architecture, even for a largescale DC. The all-optical switching techniques can be mainly divided into two categories: optical circuit switching (OCS) and optical packet switching (OPS). • OCS is a relatively mature technology with market readiness, and can be used for the core switching layer to increase the switching capacity of DCNs while significantly alleviating the traffic burden. However, OCS is designed to support the deployment of static routing with pre-established lightpaths, but statically planned lightpaths cannot handle bursty DC traffic patterns, which leads to congestion on overloaded links. Since OCS is a coarse-grained switching technology at the level of a wavelength channel, it exhibits low flexibility and inefficiency on switching bursty and fine-grained DCN traffic. • OPS has the advantage of fine-grained and adaptive switching capability but is subject to some serious problems in the aspect of technological maturity due to the lack of highspeed OPS fabrics and all-optical buffers. Hybrid Technologies Compared to an optical-switching-based network, an electronic network exhibits better expandability but poorer energy efficiency. Hybrid electronic/optical-switching-based DCN tries to combine the advantages of both electronic-switchingbased and optical-switching-based architectures [10, 11] . An electronic-switching-based network is used to transmit a small amount of delay-sensitive data, while an optical-switchingbased network is used to transmit a large amount of traffic. Table 1 compares the features of various representative DCN IEEE Network • July/August 2013 10 architectures in terms of different interconnecting technologies. As seen in Fig. 1 , suitable DCN architecture design trade-offs should attempt to satisfy application-specific bandwidth and scalability requirements while keeping deployment cost as low as possible. Inter-DCN Communications Nowadays, due to the wide deployment of rich media applications via social networks and content delivery networks, the number of DCNs around the world is increasing rapidly, and a DC seldom works alone. Thus, there is a demand to connect multiple DCNs placed in various strategic locations, and the communications among them is defined as inter-DCN communications in this section. When inter-DCN communications and intra-DCN communications are jointly considered, a two-level structure emerges. Inside a DC, any architecture presented earlier can be selected for intra-DCN communications. In this section, we first survey alternative architectures for inter-DCN communications and then the joint design between the two levels. Since optical fiber can provide large bandwidth, it can be utilized to interconnect multiple DCNs to solve the problems of traffic congestion and unbalancing. Similar to intra-DCN communications, it is hard to deploy OPS for inter-DCN communications due to the lack of high-speed optical packet switching fabrics and all-optical buffers. Instead, OCS is the better choice because of the maturity of the technology. Although OCS is characterized by slow reconfiguration on the order of a millisecond, the traffic flow in the backbone network is relatively stable, and thus the cost of lightpath configuration can be amortized over backbone traffic streams that have sufficiently long durations. In this section, two OCS technologies are considered for this purpose, wavelength-division multiplexing (WDM) and coherent optical-orthogonal frequency-division multiplexing (CO-OFDM) technology [13] . Wavelength-Division Multiplexing -Traditionally, a single-carrier laser optical source is used as the WDM light source. In order to meet the requirements for mass data transmissions, the number of laser sources needs to be increased, resulting in a sharp rise in cost and energy consumption. Thus, the demand for the increase of carriers in a WDM light source is critical. In recent years, multicarrier source generation technology [14] and point-to-point WDM transmission systems have attracted much attention. Thousand-channel dense WDM (DWDM) has also been demonstrated successfully. Based on a centralized multicarrier source, an optical broadcast and select network architecture is suggested in [14] . CO-OFDM Technology -As one of the OCS technologies, CO-OFDM shows great potential in reducing the construction cost for future DCNs. One great advantage of CO-OFDM is its all-optical traffic grooming capability compared with the legacy OCS networks where optical-to-electronicto-optical (O/E/O) conversion is required. This is a critical feature desired in the interconnection of DCNs where traffic is heterogeneous with extremely large diversity. Thus, it can improve bandwidth capacity with high efficiency and allocation flexibility. However, it suffers from the intrinsic network agility problem that is common to all of the existing OCS-based technologies. Software-Defined Networking Switch GatorCloud [15] was proposed to leverage the flexibility of software-defined networking (SDN) techniques to dramatically boost DCNs to over 100 Gb/s bandwidth with cutting-edge SDN switches by OpenFlow techniques. SDN switches separate the data path (packet forwarding fabrics) and the control path (high-level routing decisions) to allow advanced networking and novel protocol designs, which decouples decision-making logic and enables switches remotely programmable via SDN protocols. In such a way, SDN makes switches economic commodities. With SDN, networks can be abstracted and sliced for better control and optimization for many demanding dataintensive or computation-intensive applications over DCNs. Hence, SDN makes it possible to conceive a fundamentally different parallel and distributed computing engine that is deeply embedded in the DCNs, realizing application-aware service provisioning. Also, SDN-enabled networks in DCNs can be adaptively provisioned to boost data throughput for specific applications for reserved time periods. Currently, the main research issues are SDN-based flow scheduling and workload balancing, which remain unsolved.
doi:10.1109/mnet.2013.6574659 fatcat:s23pbjnqlzc6vmpapuz5heyvwm