Harnessing Multicore Processors for High-Speed Secure Transfer

John Bresnahan, Rajkumar Kettimuthu, Mike Link, Ian Foster
2007 2007 High-Speed Networks Workshop  
A growing need for ultra high-speed data transfers has motivated continued improvements in physical network layer transmission speeds. However, as researchers develop protocols and software to operate over such networks, they often fail to account for security. The processing power required to encrypt or sign packets of data can significantly decrease transfer rates, and thus security is often sacrificed for throughput. Emerging multicore processors provide a higher ratio of CPUs to network
more » ... rfaces, and can in principle be used to accelerate encrypted transfers by applying multiple processing and network resources to a single transfer. We discuss the attributes that network protocols and software must have to exploit such systems. In particular, we study how these attributes may be applied in the GridFTP code distributed with the Globus Toolkit. GridFTP is a well accepted and robust protocol for high speed data transfer. It has been shown to scale to near network speeds. While GridFTP can provide encrypted and protected data transfers, it historically suffers transfer performance penalties when these features are enabled. We present configurations to the Globus GridFTP server that can achieve fully encrypted high speed data transfers. Index Terms-Secure data transfer, GridFTP, Encryption, Parallel streams I. EXTENDED ABSTRACT In order to achieve processing parallelism on protected or encrypted transfers we must ensure that secure processing is performed on different portions of the data stream at the same time and on different CPUs. While simple in principle, this concept presents interesting problems for network protocols and software implementations. Encryption protocols that use cipher block chaining, such as TLS / SSL [1], require that data be decrypted in the same order that it was encrypted. Further, the way that bytes in a stream are processed varies with their position in the stream. Thus, it matters not only what is the value of the byte being processed, but also when it was processed. These issues introduce difficulties when breaking up the streams for parallelization. For the reasons described above, we cannot take portions of a single data stream and process them in parallel against the same security context. To properly follow secure protocols we cannot process any one byte until the previous byte has been processed, thus there can be no parallelism against a single security context. We can solve this problem by creating a many distinct security contexts for a single data transfer. A simple way to realize this approach with existing network protocols is by using parallel streams. Parallel streams are common in data transfers as a means of network optimization [2,3]. To minimize penalties associated with TCP slow start and dropped packets, many TCP streams are used for the same logical transfer, thus reducing the penalties associated with any one packet loss. This technique can also be leveraged for use in parallel encryption. Each stream has its own security context and is independent with regard to security processing. Thus, we can achieve parallel security processing. A. Related Work Hardware accelerators have been used to address the SSL performance problems. Accelerator is a card that plugs into PCI slot or SCSI port and contains a co-processor that performs part of SSL processing. Network Interface Cards with offloaded SSL and IPSec [4] have also been produced. We want to achieve high-speed secure transfers with generalpurpose hardware so that it can be used more commonly. We expect that multi-core processors would become more common than SSL/IPSec offload engines. Further, we would like to utilize the parallel and higher processing power that the multi-core technology promises to achieve high-speed secure data transfers. Also, the offload techniques do not help in achieving processing parallelism for security processing on a single node. B. Asynchronous Event Model Solving this problem in software requires some type of threaded IO model. In order to get many parallel data streams processing at once multiple threads of execution must be occurring on different CPUs. This can happen via threads in a single user process or by making use of multiple processes. The Globus toolkit achieves this type of parallelism via an asynchronous event model and thread pools [5]. We present here the advantages of the asynchronous thread pool model. In an asynchronous event model, the software developer posts I/O requests to the system. When the request is fulfilled (or an error occurs) the user is notified via a callback function which the developer defines in their own process space.
doi:10.1109/hsnw.2007.4290546 fatcat:hlvmnqn7hzdynceairmdmchkqq