An architecture for exploiting multi-core processors to parallelize network intrusion prevention

Robin Sommer, Vern Paxson, Nicholas Weaver
2009 Concurrency and Computation  
It is becoming increasingly difficult to implement effective systems for preventing network attacks, due to the combination of the rising sophistication of attacks requiring more complex analyses to detect; the relentless growth in the volume of network traffic that we must analyze; and, critically, the failure in recent years for uniprocessor performance to sustain the exponential gains that for so many years CPUs have enjoyed. For commodity hardware, tomorrow's performance gains will instead
more » ... ome from multi-core architectures in which a whole set of CPUs executes concurrently. Taking advantage of the full power of multi-core processors for network intrusion prevention requires an in-depth approach. In this work we frame an architecture customized for parallel execution of network attack analysis. At the lowest layer of the architecture is an 'Active Network Interface', a custom device based on an inexpensive FPGA platform. The analysis itself is structured as an event-based system, which allows us to find many opportunities for concurrent execution, since events introduce a natural asynchrony into the analysis while still maintaining good cache locality. A preliminary evaluation demonstrates the potential of this architecture. Copyright 1257 our applications in a highly parallel fashion: dividing the processing into concurrent tasks while minimizing inter-task communication. In our previous work with colleagues [9] , we have argued that we can extract a potentially enormous degree of parallelism from the task of network security monitoring. However, doing so requires rethinking on how we pursue the parallelism. Historically, parallelization of intrusion detection/prevention analysis has been confined to coarse-grained load-balancing (with little or no fine-grained communication between the analysis units) and fast string-matching. These approaches buy some initial speed-ups, but Amdahl's law prevents significant gains for more sophisticated analyses that require fine-grained coordination. Taking advantage of the full power of multi-core processors requires a more in-depth approach. Obviously, we need to structure the processing into separate, low-level threads that are suitable for concurrent execution. To do so, however, we need to address a number of issues: • To provide intrusion prevention functionality (i.e. active blocking of malicious traffic), we must ensure that packets are only forwarded if all relevant processing gives approval. • To perform global analysis (e.g. scan detection [10,11], worm contact graphs [12], steppingstone detection [13], content sifting [14], botnet command-and-control [15] ) we must support exchange of state across threads, but we must minimize such inter-thread communication to maximize performance. • Similarly, we must understand how the memory locality of different forms of analysis interacts with the ways in which caches are shared across threads within a CPU core and across cores. We need to be able to express the analysis in a form that is independent of the memory and threading parameters of a given CPU, so we can automatically retarget the implementations of analysis algorithms to different configurations. • We must ensure that our approach is amenable to analysis by performance debugging tools that can illuminate the presence of execution bottlenecks such as those due to memory or messaging patterns. OVERVIEW We begin our discussion with an overview of the architecture we envision; Figure 1 illustrates its overall structure. At the bottom of the diagram is the 'active network interface' (ANI). This
doi:10.1002/cpe.1422 fatcat:7yq5tk4n2neenmtbpr5jnkeatq