1,729 Hits in 6.5 sec

An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory

Stefan Schuh, Xiao Chen, Jens Dittrich
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
Finally, we conclude with a list of major lessons learned from our study and a guideline for practitioners implementing massive main-memory joins.  ...  We derive improved variants of stateof-the-art join algorithms by applying optimizations like softwarewrite combine buffers, various hash table implementations, as well as NUMA-awareness in terms of data  ...  We see that the partition-based joins indeed lead to a dramatic reduction in cache misses and reach a cache hit rate of up to 99% for the join phase.  ... 
doi:10.1145/2882903.2882917 dblp:conf/sigmod/SchuhCD16 fatcat:wtowv5mpdnhvxc2ayhvgufwaky


Steven Keith Begley, Zhen He, Yi-Ping Phoebe Chen
2012 Proceedings of the 2012 international conference on Management of Data - SIGMOD '12  
Extensive experimental results show that MCJoin outperforms a naive memory constrained version of the state-of-the-art Radix-Clustered Hash Join algorithm in all of the situations tested, with margins  ...  In contrast, we propose a Memory Constrained Join algorithm (MCJoin) which is both high performing and also performs all of its operations within a tight given memory constraint.  ...  , where tuples from input relations are scattered (based on a hash function applied to a key attribute) into hash table partitions.  ... 
doi:10.1145/2213836.2213851 dblp:conf/sigmod/BegleyHC12 fatcat:q6pztmlgobd5tlywpzjl4hj74u

The DataPath system

Subi Arumugam, Alin Dobra, Christopher M. Jermaine, Niketan Pansare, Luis Perez
2010 Proceedings of the 2010 international conference on Management of data - SIGMOD '10  
In this paper, we describe a purely-push based, research prototype database system called DataPath. DataPath is "data-centric". In DataPath, queries do not request data.  ...  First, requests for data naturally incur high latency as the data are pulled through the memory hierarchy, and second, it makes it difficult or impossible for multiple queries or operations that are interested  ...  When a chunk streams into the LHS of the join, each of the tuples in the chunk are hashed, and a lookup in the RHS hash table is performed.  ... 
doi:10.1145/1807167.1807224 dblp:conf/sigmod/ArumugamDJPP10 fatcat:sqcrkdyl5zhu5g7lcatpgwszru

Massively Parallel NUMA-Aware Hash Joins [chapter]

Harald Lang, Viktor Leis, Martina-Cezara Albutiu, Thomas Neumann, Alfons Kemper
2015 Lecture Notes in Computer Science  
We then develop a NUMA-aware hash join for massively parallel environments, and show how the specic implementation details aect the performance on a NUMA system.  ...  Our experimental evaluation shows that a carefully engineered hash join implementation outperforms previous high performance hash joins by a factor of more than two, resulting in an unprecedented throughput  ...  This join method improves cache locality by continuously partitioning into ever smaller chunks that ultimately t into the cache. Ailamaki et al.  ... 
doi:10.1007/978-3-319-13960-9_1 fatcat:g72my4iak5eyfjv7msepxs4do4

Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems [article]

Martina-Cezara Albutiu, Alfons Kemper, Thomas Neumann
2012 arXiv   pre-print
to hash joins.  ...  We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting.  ...  The recently proposed Wisconsin hash join [2] is based on a global shared hash table which has to be built across the NUMA partitions by a large number of threads.  ... 
arXiv:1207.0145v1 fatcat:wgtlxq4uqjcgtnp7grqdmi6qma

Massively parallel sort-merge joins in main memory multi-core database systems

Martina-Cezara Albutiu, Alfons Kemper, Thomas Neumann
2012 Proceedings of the VLDB Endowment  
to hash joins.  ...  We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting.  ...  The recently proposed Wisconsin hash join [2] is based on a global shared hash table which has to be built across the NUMA partitions by a large number of threads.  ... 
doi:10.14778/2336664.2336678 fatcat:6hgp4wvslzfgzb7hd77qa6u7ou

Inspector Joins

Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, Todd C. Mowry
2005 Very Large Data Bases Conference  
The key idea behind Inspector Joins is that during the I/O partitioning phase of a hash-based join, we have the opportunity to look at the actual data itself and then use this knowledge in two ways: (1  ...  We show how inspector joins, employing novel statistics and specialized indexes, match or exceed the performance of state-of-the-art cache-friendly hash join algorithms.  ...  The source cache lines of different filters in the horizontal layout are not contiguous in memory, while the destination block is a continuous chunk of memory.  ... 
dblp:conf/vldb/ChenAGM05 fatcat:vs6ofw37jffezizce7dd7ve4ou

Multi-core, main-memory joins

Cagri Balkesen, Gustavo Alonso, Jens Teubner, M. Tamer Özsu
2013 Proceedings of the VLDB Endowment  
In this paper we experimentally study the performance of main-memory, parallel, multi-core join algorithms, focusing on sort-merge and (radix-)hash join.  ...  This claim is justified based on the width of SIMD instructions (sort-merge outperforms radix-hash join once SIMD is sufficiently wide), and NUMA awareness (sort-merge is superior to hash join in NUMA  ...  For handling skew in parallel radix hash join, we previously proposed a fine-granular task decomposition method.  ... 
doi:10.14778/2732219.2732227 fatcat:v6q7kdmarbfkpcyaj47xcp2ltq

A generic front-stage for semi-stream processing

M. Asif Naeem, Gerald Weber, Gillian Dobbie, Christof Lutteroth
2013 Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13  
We propose a caching approach that can be used as a front-stage for different semi-stream join algorithms, resulting in significant performance gains for common applications.  ...  We analyze our approach in the context of a seminal semi-stream join, MESHJOIN (Mesh Join), and provide a cost model for the resulting semi-stream join algorithm, which we call CMESHJOIN (Cached Mesh Join  ...  It uses a two-level hash table for attempting to join stream tuples as soon as they arrive, and uses a partition-based waiting area for other stream tuples.  ... 
doi:10.1145/2505515.2505734 dblp:conf/cikm/NaeemWDL13 fatcat:sp6anmxsszbdvk7frxp63xkx74

Design and evaluation of parallel hashing over large-scale data

Long Cheng, Spyros Kotoulas, Tomas E Ward, Georgios Theodoropoulos
2014 2014 21st International Conference on High Performance Computing (HiPC)  
In this work, using such a method, we propose a high-level parallel hashing framework, Structured Parallel Hashing, targeting efficiently processing massive data on distributed memory.  ...  A common data structure used in such environment is the hash tables. This paper focuses on investigating efficient parallel hash algorithms for processing large-scale data.  ...  Regardless, the method for joins focuses on workload assignment in hardware-level, such as that the size of data chunks is set to the cache size so as to minimize the cache miss etc.  ... 
doi:10.1109/hipc.2014.7116909 dblp:conf/hipc/ChengKWT14 fatcat:cj6722pe6vaothb3b2gigq5cea


Rubao Lee, Xiaoning Ding, Feng Chen, Qingda Lu, Xiaodong Zhang
2009 Proceedings of the VLDB Endowment  
However, the shared LLC can also be a performance bottleneck to concurrent queries, each of which has private data structures, such as a hash table for the widely used hash join operator, causing serious  ...  In this paper, we propose a hybrid system method called MCC-DB for accelerating executions of warehouse-style queries, which relies on the DBMS knowledge of data access patterns to minimize LLC conflicts  ...  We thank the anonymous referees for their comments. We also thank our colleague Bill Bynum for reading this paper and his comments.  ... 
doi:10.14778/1687627.1687670 fatcat:haewfaorirefxmwsewj6253mdy

SAHA: A String Adaptive Hash Table for Analytical Databases

Tianqi Zheng, Zhibin Zhang, Xueqi Cheng
2020 Applied Sciences  
In this paper, we address some common use cases of hash tables: aggregating and joining over arbitrary string data.  ...  Hash tables are the fundamental data structure for analytical database workloads, such as aggregation, joining, set filtering and records deduplication.  ...  Acknowledgments: We thank the Yandex ClickHouse team for reviewing the SAHA code and helping merge it to the ClickHouse code base.  ... 
doi:10.3390/app10061915 fatcat:7yw3swcdnvaazpasfohabzzbbi

Versatile and scalable parallel histogram construction

Wookeun Jung, Jongsoo Park, Jaejin Lee
2014 Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14  
., cache capacity and SIMD width). This paper presents versatile histogram methods that achieve competitive performance across a wide range of input types and target architectures.  ...  Histograms are used in various fields to quickly profile the distribution of a large amount of data.  ...  Compared to hash function computation, hash table manipulation is a memory intensive task, resulting in ≤ 1.2× simd speedups.  ... 
doi:10.1145/2628071.2628108 dblp:conf/IEEEpact/JungPL14 fatcat:maaugr6st5e3rkmyz6k2udscwe

Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster

Sudhakar Singh, Rakhi Garg, P.K. Mishra
2015 International Journal of Computer Applications  
In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm.  ...  Experiments are carried out on both real life and synthetic datasets which shows that hash table trie data structures performs far better than trie and hash tree in terms of execution time.  ...  Hash Using hash table enlarges the size of a node, which could not be cached in and may be moved into memory. Linear search is fast in cache and reading operation is slower for memory.  ... 
doi:10.5120/ijca2015906632 fatcat:bl34g7nlrzfv5gnqktr5cjgvuq

Energy-Efficient Hash Join Implementations in Hardware-Accelerated MPSoCs

Sebastian Haas, Gerhard P. Fettweis
2017 Very Large Data Bases Conference  
Hence, we compare two hash table designs according to their memory accesses and investigate the performance impact of the additional hashing instructions.  ...  In this paper, we study the implementation of hash join algorithms on MPSoCs and exemplarily employ the Tomahawk4 chip.  ...  ACKNOWLEDGMENTS This work has been supported in part by the state of Saxony under grant of the German Research Foundation (DFG) within the Cluster of Excellence "Center for Advancing Electronics Dresden  ... 
dblp:conf/vldb/HaasF17 fatcat:q3zq5fmryngi7oynt4jpikpkyq
« Previous Showing results 1 — 15 out of 1,729 results