Michael E. Fisk, Curtis L. Hash
2014 Proceedings of the Fourth International Workshop on Cloud Data and Platforms - CloudDP '14  
In this paper we present FileMap, an open-source, 1 alternative map-reduce-based computing system that we have developed and utilized over the last 5 years. This system features several significant design decisions and performance aspects that are not found in prevalent map-reduce systems such as Hadoop [16] . The prevailing design goal is to have a system for scheduling and orchestrating parallel and distributed data processing, but that does not interpose itself between data and the serial
more » ... grams that process data. FileMap manages the organization of input and output files and the scheduling of program execution, but does not process files itself and is agnostic to the format in which data is stored and/or indexed. We layer on top of existing, ubiquitous file systems, security models, and network access software in order to minimize the complexity of FileMap and maximize the ability of its users to benefit from specialized compute platforms, file systems, and software. We measure the performance of FileMap in several instantiations including a heterogeneous "cloud" conglomeration of computers and storage distributed across multiple owning organizations with no cross-organization trust or synchronization. This "cloud" model intentionally supports distributed sensor systems in which nodes collect their own data and participate in analysis of data by moving map/reduce processing upstream to where the data is collected. Our on SMP systems, clusters built for Hadoop, and this distributed cloud, show that FileMap outperforms more prevalent computing systems and models by factors between 2x (compared to Hadoop) and 14x (cloud vs. centralized). 1
doi:10.1145/2592784.2592790 dblp:conf/eurosys/FiskH14 fatcat:gsgtsdfmrfc6jlkbc3r5f53csy