Fei Chen, Tere Gonzalez, Jun Li, Manish Marwah, Jim Pruyne, Krishnamurthy Viswanathan, Mijung Kim
2014 Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD '14  
Hadoop and its variants have been widely used for processing large scale analytics tasks in a cluster environment. However, use of a commodity cluster for analytics tasks needs to be reconsidered based on two key observations: (1) in recent years, large memory, multicore machines have become more affordable; and (2) recent studies show that most analytics tasks in practice are smaller than 100 GB. Thus, replacing a commodity cluster with a large memory, multicore machine can enable in-memory
more » ... lytics at an affordable cost. However programming on a big-memory, multicore machine is a challenge. Multi-threaded programming is notoriously difficult. Further, the memory design of most large memory servers follows non-uniform memory access (NUMA) architecture. While NUMA-aware programming often leads to high efficiency in analytics tasks, it is usually done in an ad hoc manner. In this demo, we present Palette, an analytics framework that exploits large memory to trade space for time while also addressing the challenges of multi-threaded, NUMAaware programming. Palette manages multiple, index-like data representations for input datasets. An operator may have multiple implementations, each of which uses a different data representation. Palette uses a cost-based approach to automatically select the fastest one on a given dataset. Palette addresses challenges of multi-threaded and NUMA-aware programming by adapting Hadoop for a single multicore machine and modifying it by considering the characteristics of modern NUMA hardware. Users can write programs using exactly the same APIs as those used in traditional Hadoop, while transparently benefiting from multithreaded and NUMA-aware infrastructure. We have developed a research prototype of Palette. Specifically, at SIGMOD we will demonstrate how to (1) create an operator, such as time series similarity search, on Palette, (2) execute the operator with Palette's automatic implementation selection feature, and (3) monitor and compare different operator implementations.
doi:10.1145/2588555.2594509 dblp:conf/sigmod/ChenGLMPVK14 fatcat:ezdrp7vi5zauvdyik3qc2kdxmu