Size Oblivious Programming with InfiniMem [chapter]

Sai Charan Koduru, Rajiv Gupta, Iulian Neamtiu
2016 Lecture Notes in Computer Science  
Many recently proposed BigData processing frameworks make programming easier, but typically expect the datasets to fit in the memory of either a single multicore machine or a cluster of multicore machines. When this assumption does not hold, these frameworks fail. We introduce the InfiniMem framework that enables size oblivious processing of large collections of objects that do not fit in memory by making them disk-resident. InfiniMem is easy to program with: the user just indicates the large
more » ... llections of objects that are to be made disk-resident, while InfiniMem transparently handles their I/O management. The InfiniMem library can manage a very large number of objects in a uniform manner, even though the objects have di↵erent characteristics and relationships which, when processed, give rise to a wide range of access patterns requiring di↵erent organizations of data on the disk. We demonstrate the ease of programming and versatility of InfiniMem with 3 di↵erent probabilistic analytics algorithms, 3 di↵erent graph processing size oblivious frameworks; they require minimal e↵ort, 6-9 additional lines of code. We show that InfiniMem can successfully generate a mesh with 7.5 million nodes and 300 million edges (4.5 GB on disk) in 40 minutes and it performs the PageRank computation on a 14GB graph with 134 million vertices and 805 million edges at 14 minutes per iteration on an 8-core machine with 8 GB RAM. Many graph generators and processing frameworks cannot handle such large graphs. We also exploit InfiniMem on a cluster to scale-up an object-based DSM.
doi:10.1007/978-3-319-29778-1_1 fatcat:4a6sejimqrdmfbuatq7c3mrdsu