Filters








1,447 Hits in 5.1 sec

Higher-order and tuple-based massively-parallel prefix sums

Sepideh Maleki, Annie Yang, Martin Burtscher
2016 Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2016  
SAM outperforms CUB by up to a factor of 2.9 on higher-order prefix sums and by up to a factor of 2.6 on tuple-based prefix sums.  ...  This paper discusses two orthogonal generalizations thereof, which we call higher-order and tuple-based prefix sums.  ...  National Science Foundation under grants 1217231, 1406304, and 1438963, a REP grant from Texas State University, and hardware donations from NVIDIA Corporation.  ... 
doi:10.1145/2908080.2908089 dblp:conf/pldi/MalekiYB16 fatcat:wx5xht6vozey3pxqbcq6fex4uy

Higher-order and tuple-based massively-parallel prefix sums

Sepideh Maleki, Annie Yang, Martin Burtscher
2016 SIGPLAN notices  
SAM outperforms CUB by up to a factor of 2.9 on higher-order prefix sums and by up to a factor of 2.6 on tuple-based prefix sums.  ...  This paper discusses two orthogonal generalizations thereof, which we call higher-order and tuple-based prefix sums.  ...  National Science Foundation under grants 1217231, 1406304, and 1438963, a REP grant from Texas State University, and hardware donations from NVIDIA Corporation.  ... 
doi:10.1145/2980983.2908089 fatcat:2mkozrljg5d73oqccoeouorarq

GeoTrie: A scalable architecture for location-temporal range queries over massive geotagged data sets

Rudyar Cortes, Xavier Bonnaire, Olivier Marin, Luciana Arantes, Pierre Sens
2016 2016 IEEE 15th International Symposium on Network Computing and Applications (NCA)  
The proliferation of GPS-enabled devices leads to the massive generation of geotagged data sets recently known as Big Location Data.  ...  It allows users to explore and analyse data in space and time, and requires an architecture that scales with the insertions and location-temporal queries workload from thousands to millions of users.  ...  Tuple key T k Output: Leaf node lower = 0; higher = D; while lower ≤ higher do middle = (lower + higher)/2; // Extract the prefix of size middle of every coordinate of T k and route the message node =  ... 
doi:10.1109/nca.2016.7778584 dblp:conf/nca/CortesBMAS16 fatcat:nayeecnitbbkfff6ruf4v54lz4

Massively parallel sort-merge joins in main memory multi-core database systems

Martina-Cezara Albutiu, Alfons Kemper, Thomas Neumann
2012 Proceedings of the VLDB Endowment  
We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting.  ...  Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing.  ...  The prefix sums psi per worker Wi, which are computed from the combined local histograms, are essential for the synchronization-free parallel scattering of the tuples into their range partition.  ... 
doi:10.14778/2336664.2336678 fatcat:6hgp4wvslzfgzb7hd77qa6u7ou

Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems [article]

Martina-Cezara Albutiu, Alfons Kemper, Thomas Neumann
2012 arXiv   pre-print
We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting.  ...  Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing.  ...  The prefix sums psi per worker Wi, which are computed from the combined local histograms, are essential for the synchronization-free parallel scattering of the tuples into their range partition.  ... 
arXiv:1207.0145v1 fatcat:wgtlxq4uqjcgtnp7grqdmi6qma

Query processing on prefix trees live

Thomas Kissinger, Benjamin Schlegel, Dirk Habich, Wolfgang Lehner
2013 Proceedings of the 2013 international conference on Management of data - SIGMOD '13  
Current in-memory databases are usually columnstores that exchange columns or vectors between operators and suffer from a high tuple reconstruction overhead.  ...  To keep the intermediate index materialization costs low, we employ optimized prefix trees that offer a balanced read/write performance.  ...  Configuring the prefix tree with a higher k and adding batch processing will result in a performance better than the hash tables.  ... 
doi:10.1145/2463676.2463682 dblp:conf/sigmod/KissingerSHL13 fatcat:mgcmplvp7jeondvm4y7gx27eou

Relational joins on graphics processors

Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, Pedro Sander
2008 Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD '08  
Our GPU-based algorithms are able to achieve 2-20 times higher performance than their CPU-based counterparts.  ...  Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU and use parallel computation to effectively hide the memory latency.  ...  GPUs can be regarded as massively parallel processors with 10x faster computation and 10x higher memory bandwidth than CPUs [3] .  ... 
doi:10.1145/1376616.1376670 dblp:conf/sigmod/HeYFLGLS08 fatcat:hbgfjpse25d4fp7ntnscnh7cvq

Relational query coprocessing on graphics processors

Bingsheng He, Mian Lu, Ke Yang, Rui Fang, Naga K. Govindaraju, Qiong Luo, Pedro V. Sander
2009 ACM Transactions on Database Systems  
Compared with commodity CPUs, GPUs have an order of magnitude higher computation power as well as memory bandwidth.  ...  He et al. utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls.  ...  Second, we compute a prefix sum on flag, and store the prefix sum into another array 3.1.2 Access methods.  ... 
doi:10.1145/1620585.1620588 fatcat:ah2bp43ac5cabcq5xxvos4yt3a

Massive Parallelization of Massive Sample-size Survival Analysis [article]

Jianxiao Yang, Martijn J. Schuemie, Marc A. Suchard
2022 arXiv   pre-print
by orders-of-magnitude as compared to traditional multi-core CPU parallelism.  ...  In this paper, we use graphics processing units (GPUs) to parallelize the computational bottlenecks of massive sample-size survival analyses.  ...  operations, and S pre [ν] as the prefix sum of arbitrary vector ν.  ... 
arXiv:2204.08183v1 fatcat:cl36jsu6bjcmpofa5hgrnxzmui

How to barter bits for chronons

Allison L. Holloway, Vijayshankar Raman, Garret Swart, David J. DeWitt
2007 Proceedings of the 2007 ACM SIGMOD international conference on Management of data - SIGMOD '07  
Data warehouse systems have found that they can avoid the unpredictability of joins and indexing and achieve good performance by using massive parallel processing to perform scans over compressed vertical  ...  We investigate a variety of compression formats and propose two novel optimizations: tuple length quantization and a field length lookup table, for efficiently processing variable length fields and tuples  ...  The authors would like to thank Ken Ross and the anonymous reviewers for helpful feedback on the paper.  ... 
doi:10.1145/1247480.1247525 dblp:conf/sigmod/HollowayRSD07 fatcat:yytcwpjhizeajjq2lftpcsekla

X-device query processing by bitwise distribution

Holger Pirk, Thibault Sellam, Stefan Manegold, Martin Kersten
2012 Proceedings of the Eighth International Workshop on Data Management on New Hardware - DaMoN '12  
Each of the resulting bit-partitions is stored and processed on one of the available devices.  ...  While pleasantly simple, this strategy has a number of problems: it may leave the "inappropriate" devices idle while overloading the "appropriate" device and putting a high pressure on the PCI bus.  ...  Fast sequential execution based on behavior prediction (pipelining, prefetching, branch prediction, ...) is replaced by simple, yet massively parallel, execution.  ... 
doi:10.1145/2236584.2236591 dblp:conf/damon/PirkSMK12 fatcat:h2pxtjiplbhspa7tyswkk2evbe

Parallel Computation of Component Trees on Distributed Memory Machines

Markus Gotz, Gabriele Cavallaro, Thierry Geraud, Matthias Book, Morris Riedel
2018 IEEE Transactions on Parallel and Distributed Systems  
A novel tuple-based merging scheme allows to merge the acquired partial images into a globally correct view.  ...  This work proposes a new efficient hybrid algorithm for the parallel computation of two particular component trees-the max-and min-tree-in shared and distributed memory environments.  ...  ACKNOWLEDGMENTS The authors would like to thank Igancio Toledo and Martin Kornmesser for making the ESO/VVV Survey/D. Minniti image with the id eso1242a publicly available.  ... 
doi:10.1109/tpds.2018.2829724 fatcat:mvb4yiv47rgwrbz5r5g53x6r4y

Fast Parallel Suffix Array on the GPU [chapter]

Leyuan Wang, Sean Baxter, John D. Owens
2015 Lecture Notes in Computer Science  
The second, a hybrid skew and prefix-doubling implementation, is the first of its kind on the GPU and achieves a speedup of 2.3-4.4x over Osipov's prefix-doubling and 2.4-7.9x over our skew implementation  ...  Our implementations rely on two efficient parallel primitives, a merge and a segmented sort.  ...  Fast Parallel Suffix Array on the GPU  ... 
doi:10.1007/978-3-662-48096-0_44 fatcat:q3dlbm6enrhrboksyni4y5pxqy

Adaptive MapReduce Similarity Joins [article]

Samuel McCauley, Francesco Silvestri
2018 arXiv   pre-print
Hu, Tao, and Yi (PODS 17) investigated joins in a massively parallel setting, showing strong results that adapt to the size of the output.  ...  Recent research has investigated how locality-sensitive hashing (LSH) can be used for similarity join, and in particular two recent lines of work have made exciting progress on LSH-based join performance  ...  We also thank the participants of the AlgoPARC Workshop on Parallel Algorithms and Data Structures, in part supported by the NSF grant no. 1745331.  ... 
arXiv:1804.05615v1 fatcat:7wwnljdftze2fi252lugrkldxq

Adaptive MapReduce Similarity Joins

Samuel McCauley, Francesco Silvestri
2018 Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond - BeyondMR'18  
Hu, Tao, and Yi (PODS 17) investigated joins in a massively parallel setting, showing strong results that adapt to the size of the output.  ...  Recent research has investigated how locality-sensitive hashing (LSH) can be used for similarity join, and in particular two recent lines of work have made exciting progress on LSH-based join performance  ...  We also thank the participants of the AlgoPARC Workshop on Parallel Algorithms and Data Structures, in part supported by the NSF grant no. 1745331.  ... 
doi:10.1145/3206333.3206340 dblp:conf/sigmod/McCauley018 fatcat:fuqwf45pwrhnfol4uwvy46a73q
« Previous Showing results 1 — 15 out of 1,447 results