Accelerating text mining workloads in a MapReduce-based distributed GPU environment

Peter Wittek, Sándor Darányi
2013 Journal of Parallel and Distributed Computing  
Scientific computations have been using GPU-enabled computers successfully, often relying on distributed nodes to overcome the limitations of device memory. Only a handful of text mining applications benefit from such infrastructure. Since the initial steps of text mining are typically data-intensive, and the ease of deployment of algorithms is an important factor in developing advanced applications, we introduce a flexible, distributed, MapReducebased text mining workflow that performs
more » ... d operations on CPUs with industry-standard tools and then runs compute-bound operations on GPUs which are optimized to ensure coalesced memory access and effective use of shared memory. We have performed extensive tests of our algorithms on a cluster of eight nodes with two NVidia Tesla M2050 attached to each, and we achieve considerable speedups for random projection and self-organizing maps.
doi:10.1016/j.jpdc.2012.10.001 fatcat:tem562gscfgqlpea3n6quj3qtq