91 Hits in 6.6 sec

Rethinking Distributed Query Execution on High-Speed Networks

Abdallah Salama, Carsten Binnig, Tim Kraska, Ansgar Scherp, Tobias Ziegler
2017 IEEE Data Engineering Bulletin  
However, all these novel RDMA-based query operators are still designed for a classical shared-nothing architecture that relies on a shuffle-based execution model to redistribute the data.  ...  Our experiments show that in the best case our prototype database system called I-Store, which is designed for fast networks from scratch, provides 3× speed-up over a shuffle-based execution model that  ...  Exp. 1: System Evaluation As data set, we used the schema and data generator of the Star Schema Benchmark (SSB) [10] and created a database of SF = 100.  ... 
dblp:journals/debu/SalamaBKSZ17 fatcat:7sjaut5by5gbvjodrevowtsmb4

High-Speed Query Processing over High-Speed Networks [article]

Wolf Roediger, Tobias Muehlbauer, Alfons Kemper, Thomas Neumann
2015 arXiv   pre-print
An extensive evaluation within the HyPer database system using the TPC-H benchmark shows that our holistic approach indeed enables high-speed query processing over high-speed networks.  ...  It consists of two parts: First, hybrid parallelism that distinguishes local and distributed parallelism for better scalability in both the number of cores as well as servers.  ...  Discussion In the previous sections we discussed how to tune TCP and RDMA for analytical database workloads that shuffle large amounts of data between servers.  ... 
arXiv:1502.07169v4 fatcat:girygakxsfewrp3u6jcyz54oea

High-speed query processing over high-speed networks

Wolf Rödiger, Tobias Mühlbauer, Alfons Kemper, Thomas Neumann
2015 Proceedings of the VLDB Endowment  
An extensive evaluation within the HyPer database system using the TPC-H benchmark shows that our holistic approach indeed enables high-speed query processing over high-speed networks.  ...  It consists of two parts: First, hybrid parallelism that distinguishes local and distributed parallelism for better scalability in both the number of cores as well as servers.  ...  Discussion In the previous sections we discussed how to tune TCP and RDMA for analytical database workloads that shuffle large amounts of data between servers.  ... 
doi:10.14778/2856318.2856319 fatcat:aqpagqu5ibc4li5rm65nwh5efu

Modularis: Modular Relational Analytics over Heterogeneous Distributed Platforms [article]

Dimitrios Koutsoukos and Ingo Müller and Renato Marroquín and Ana Klimovic and Gustavo Alonso
2021 arXiv   pre-print
The enormous quantity of data produced every day together with advances in data analytics has led to a proliferation of data management and analysis systems.  ...  To address this limitation, we present Modularis, an execution layer for data analytics based on sub-operators, i.e.  ...  Determining the right granularity of operators has been a reoccurring topic of research: from the bracket model [23] for parallelization in the early days of databases to objected-oriented modular designs  ... 
arXiv:2004.03488v2 fatcat:e7ft5oersvcarms2wsagxyopwa

The End of Slow Networks: It's Time for a Redesign [article]

Carsten Binnig, Andrew Crotty, Alex Galakatos, Tim Kraska, Erfan Zamanian
2015 arXiv   pre-print
Next generation high-performance RDMA-capable networks will require a fundamental rethinking of the design and architecture of modern distributed DBMSs.  ...  In this paper, we first argue that the "old" distributed database design is not capable of taking full advantage of the network.  ...  Thus, we believe that there is a need for parallel cache-aware algorithms for query operators over RDMA.  ... 
arXiv:1504.01048v2 fatcat:yezftuvowbdwlgwet6fgmyjo7y

To Ship or Not to (Function) Ship (Extended version) [article]

Feilong Liu, Niranjan Kamat, Spyros Blanas, Arnab Nandi
2018 arXiv   pre-print
Whether function shipping or data shipping should be preferred depends on the amount of data transferred, the current CPU utilization, the sampling method and the number of queries executed over the data  ...  The established parallel data processing paradigm relies on function shipping, where a coordinator dispatches queries to worker nodes and then collects the results.  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.  ... 
arXiv:1807.11149v1 fatcat:3hhqugoqtfamtawwo5uujnbnd4

Rack-Scale In-Memory Join Processing using RDMA

Claude Barthels, Simon Loesing, Gustavo Alonso, Donald Kossmann
2015 Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data - SIGMOD '15  
Database systems running on a cluster of machines, i.e. rack-scale databases, are a common architecture for many large databases and data appliances.  ...  The results of this paper are, to our knowledge, the first detailed analysis of parallel hash joins using RDMA.  ...  The authors would like to thank the anonymous reviewers for their valuable comments and suggestions. This work has been funded in part by a grant from Oracle Labs.  ... 
doi:10.1145/2723372.2750547 dblp:conf/sigmod/BarthelsLAK15 fatcat:bwciklysjbcfvlivxiiyer4kqm

RDMA Communciation Patterns

Tobias Ziegler, Viktor Leis, Carsten Binnig
2020 Datenbank-Spektrum  
the communication patterns of scale-out systems.  ...  While there have been some initial studies included in papers that aim to investigate selected performance characteristics of particular design choices, there has not been a systematic study to evaluate  ...  To view a copy of this licence, visit 0/.  ... 
doi:10.1007/s13222-020-00355-7 fatcat:t6qnyrtt7feepg4bero4ce4wgq

Query fresh

Tianzheng Wang, Ryan Johnson, Ippokratis Pandis
2017 Proceedings of the VLDB Endowment  
Query Fresh avoids the dual-copy design and treats the log as the database, enabling lightweight, parallel log replay that does not block the primary.  ...  Hot standby systems often have to trade safety (i.e., not losing committed work) and freshness (i.e., having access to recent updates) for performance.  ...  Acknowledgements We would like the thank the team behind the Apt cluster at the University of Utah and Hewlett Packard Labs for their help on provisioning and configuring machines for our experiments.  ... 
doi:10.1145/3186728.3164137 fatcat:kr3lpn2krbfivik7jrjkc7t3by

K MapReduce: A scalable tool for data-processing and search/ensemble applications on large-scale supercomputers

Motohiko Matsuda, Naoya Maruyama, Shin'ichiro Takizawa
2013 2013 IEEE International Conference on Cluster Computing (CLUSTER)  
Its objectives are to ease programming for data-processing and to achieve efficiency by utilizing the large amount of memory available in large scale supercomputers.  ...  Sorting is optimized using fixed length packed keys instead of variable-length raw keys, which is extensively used inside of shuffling and reducing operations.  ...  Synchronous Opera/ions There are two major design choices for the mode of operations.  ... 
doi:10.1109/cluster.2013.6702663 dblp:conf/cluster/MatsudaMT13 fatcat:w5geh2geszc2rnb7453sisooby

Analyzing efficient stream processing on modern hardware

Steffen Zeuch, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, Sebastian Breß, Tilmann Rabl, Volker Markl
2019 Proceedings of the VLDB Endowment  
To this end, we conduct an extensive experimental analysis of current SPEs and SPE design alternatives optimized for modern hardware.  ...  We show that the single-node throughput can be increased by up to two orders of magnitude compared to state-of-the-art SPEs by applying specialized code generation, fusing operators, batch-style parallelization  ...  This work was funded by the EU projects E2Data (780245), DFG Priority Program "Scalable Data Management for Future Hardware" (MA4662-5), and the German Ministry for Education and Research as BBDC I (01IS14013A  ... 
doi:10.14778/3303753.3303758 fatcat:3ugpwvys3vf2vba2npn2n2t47m

A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures [article]

Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha, Geoffrey C.Fox
2014 arXiv   pre-print
resources, and storing and transferring large volumes of data.  ...  We analyze the ecosystems of the two prominent paradigms for data-intensive applications, hereafter referred to as the high-performance computing and the Apache-Hadoop paradigm.  ...  This work has also been made possible thanks to computer resources provided by XRAC award TG-MCB090174 and an Amazon Computing Award to SJ.  ... 
arXiv:1403.1528v2 fatcat:dnyrpncqfneofaxyuvq3tzffz4

Hardware-Conscious Stream Processing: A Survey [article]

Shuhao Zhang, Feng Zhang, Yingjun Wu, Bingsheng He, Paul Johns
2020 arXiv   pre-print
Data stream processing systems (DSPSs) enable users to express and run stream applications to continuously process data streams.  ...  To achieve real-time data analytics, recent researches keep focusing on optimizing the system latency and throughput.  ...  The authors would like to thank the anonymous reviewer and the associate editor, Pınar Tözün, for their insightful comments on improving this manuscript.  ... 
arXiv:2001.05667v1 fatcat:hga7siyyzvbavilpxvxjofvtii

Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats [article]

Tianyu Li, Matthew Butrovich, Amadou Ngom, Wan Shen Lim, Wes McKinney, Andrew Pavlo
2020 arXiv   pre-print
We aim to reduce or even eliminate this process by developing a storage architecture for in-memory database management systems (DBMSs) that is aware of the eventual usage of its data and emits columnar  ...  The advantage of these formats is that they help organizations avoid repeatedly converting data to a new format for each application.  ...  ORC is a self-describing type-aware columnar file format designed for Hadoop. It divides data into stripes that are similar to our concept of blocks.  ... 
arXiv:2004.14471v1 fatcat:cf3fma5wlbamzmxzvvgqg5himi


Biswapesh Chattopadhyay, Sagar Mittal, Roee Ebenstein, Nikita Mikhaylin, Hung-ching Lee, Xiaoyan Zhao, Tony Xu, Luis Perez, Farhad Shahmohammadi, Tran Bui, Neil McKay, Priyam Dutta (+10 others)
2019 Proceedings of the VLDB Endowment  
Large organizations like YouTube are dealing with exploding data volume and increasing demand for data driven applications.  ...  This, however, creates silos of data and processing, and results in a complex, expensive, and harder to maintain infrastructure.  ...  ACKNOWLEDGEMENTS We would like to thank the Dremel team, especially Mosha Pasumansky, for working closely with us on using the Capacitor format and enabling reuse of the Dremel interfaces.  ... 
doi:10.14778/3352063.3352121 fatcat:ccnemrteevg2xm6lcy67peck6a
« Previous Showing results 1 — 15 out of 91 results