Realistic Large-Scale Online Network Simulation

X. Liu, A. A. Chien
2006 The international journal of high performance computing applications  
Large-scale network simulation is an important technique for studying the dynamic behavior of networks, network protocols, and emerging classes of distributed application (e.g. Grid, peer-to-peer, etc.) Large-scale and realism are two critical requirements for network simulations of Grid application studies. Our work here extends previous efforts in three key ways. First, we study networks 100x larger than in our previous studies (20,000 routers). Second, at this scale, we study realistic
more » ... k struct ures ( 100 AS's, BGP4 and OSPF routing) versus flat OSPF routing. Finally, we describe and evaluate a new profile-based load-balancing approach called hierarchical profile-based load balance. Our extensive large-scale experiments with profile-based load balance (PROF) on flat-routed (OSPF) networks show that PROF outperforms several other techniques based on topology and static application information. However, these results and those for multi-AS networks motivate our invention of a new hierarchical technique (HPROF) which clusters network nodes to achieve a desired minimum link latency (MLL), a key determinant of simulation parallelism, then applies the graph partitioner. HPROF explicitly controls the tradeoff between simulation efficiency and available parallelism, producing robust and superior performance for large-scale networks, including both single-AS and multi-AS networks. HPROF can improve load imbalance by 40%, and reduce the simulation time by about 50% in our 20,000 router simulations executed on 128-node clusters. The parallel efficiency achieved by these simulations is over 40%, providing substantial capabilities for simulating large networks. In summary, these advances demonstrate that realistic large-scale network simulation for networks of 20,000 routers (comparable to a large Tier-1 ISP network like AT&T) can be accomplished with our system. Introduction Historically, network simulations/emulations have been used extensively to explore the behavior of network protocols[1-3]. Because of the difficulty of modeling application behavior in detail, most of these simulations use simple application models to exercise the protocols and networks. However, with the advent of large numbers of applications which tightly couple the use of compute, storage, and network, techniques to study these resources together are emerging. In particular, 0-7695-2153-3/04 $20.00 (c)2004 IEEE large-scale network simulation is an important technique for studying the dynamic behavior of networks, network protocols, and emerging classes of distributed applications, including Peer-to-Peer [4] and Grid applications [5] -where the network is an important contributor to application performance, applications generate large amounts of network traffic, and overall application performance is critical. A wide variety of simulation systems have been built to model network behavior based on discrete event simulation[6-9]. The M aSSF, a network simulation tool [10] is a key component of the MicroGrid system[11] built by our group at UCSD to study the dynamic behavior of Grid applications. The MicroGrid enables the execution of complete Grid or distributed applications. There are two key requirements for a network simulator targeted for large-scale study of such applications and resource infrastructures. The first requirement is that it must scale to Internet-scale network. As in many other network simulation projects, the MaSSF utilizes cluster systems to achieve scalable performance. By harnessing scalable compute resources, the MaSSF system and user applications together are themselves an interesting distributed application, and load balance of network simulation itself is one key problem for scalability. In our previous work[10], we formulated the load balance problem as a graph partitioning problem and applied classical graph partition algorithms [12] [13] [14] [15] to solve it. Three approaches exploiting topology only, topology and application placement, and profile-based were presented and evaluated for moderate-sized networks. The results showed that exploiting static topology and application placement information improves load balance, but a profile-based approach further improves the load balance achieved. In this paper, we improve on all of these with a new hierarchical approach and evaluate all of them on much larger networks (100x). The second requirement for large-scale network simulation is that it must simulate in detail the structure of realistic networks. Our previous published work on MaSSF [10] addresses simulation accuracy (validation) in this paper we will address the issue of realistic network topology and routing selection. While much research explores realistic Internet-like topology generators and background traffic, few efforts explore realistic network routing with most large-scale simulations pursuing only shortest-path routing (OSPF). It is well-known that in large, multi-AS networks, routing amongst different AS domains is controlled by BGP and policy routing, therefore connectivity does not equal reachability. A realistic
doi:10.1177/1094342006067814 fatcat:wxs6adj3gzf3jp4hdfvltbdcwm