Filters








193 Hits in 10.2 sec

Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query Size [article]

Rong Zhu, Tianjing Zeng, Andreas Pfadler, Wei Chen, Bolin Ding, Jingren Zhou
2021 arXiv   pre-print
Its key idea is to elegantly decouple the correlations across different tables and losslessly merge single table CardEst results to estimate the join query size.  ...  Whereas, the hardest problem in CardEst, i.e., how to estimate the join query size on multiple tables, has not been extensively solved.  ...  We propose G , a general CardEst framework that is able to merge single table CardEst results to predict join query size.  ... 
arXiv:2112.03458v1 fatcat:zsw6tzgbafbbbkduj2tjoefcyq

Solving the Join Ordering Problem via Mixed Integer Linear Programming

Immanuel Trummer, Christoph Koch
2017 Proceedings of the 2017 ACM International Conference on Management of Data - SIGMOD '17  
Our experimental results are encouraging: we are able to optimize queries joining 40 tables within less than one minute of optimization time.  ...  Such query sizes are far beyond the capabilities of traditional query optimization algorithms with worst case guarantees on plan quality.  ...  For the inner operand, we can estimate the byte size by summing over the column variables, weighted by the column byte size as well as by the cardinality of the table that the column belongs to.  ... 
doi:10.1145/3035918.3064039 dblp:conf/sigmod/Trummer017 fatcat:7a3iunoch5dhvo6rv4gzghxvzq

Scalable multi-query optimization for exploratory queries over federated scientific databases

Anastasios Kementsietsidis, Frank Neven, Dieter Van de Craen, Stijn Vansummeren
2008 Proceedings of the VLDB Endowment  
The proposed algorithms are necessarily heuristics, as computing an optimal global evaluation plan is shown to be np-hard.  ...  The diversity and large volumes of data processed in the Natural Sciences today has led to a proliferation of highlyspecialized and autonomous scientific databases with inherent and often intricate relationships  ...  Essentially, Stocker et al. propose to conservatively estimate the size of an intersection R1∩• • •∩Rn by the minimum of the cardinalities |R1|, . . . , |Rn|.  ... 
doi:10.14778/1453856.1453864 fatcat:mqjhglcg5jexzjckrod4c2scl4

Evaluating end-to-end optimization for data analytics applications in weld

Shoumik Palkar, Saman Amarasinghe, Samuel Madden, Matei Zaharia, James Thomas, Deepak Narayanan, Pratiksha Thaker, Rahul Palamuttam, Parimajan Negi, Anil Shanbhag, Malte Schwarzkopf, Holger Pirk
2018 Proceedings of the VLDB Endowment  
Our results are promising: using our optimizer, Weld accelerates data science workloads by up to 23× on one thread and 80× on eight threads, and its adaptive optimizations provide up to a 3.75× speedup  ...  In this work, we further develop the Weld vision by designing an automatic adaptive optimizer for Weld applications, and evaluating its impact on realistic data science workloads.  ...  This design adapts to both small and large cardinalities in the event that Weld does not have access to statistical information (e.g., cardinality estimates) at optimization time.  ... 
doi:10.14778/3213880.3213890 fatcat:oesslpgfy5awlb32xnylmjlnoa

Query evaluation techniques for large databases

Goetz Graefe
1993 ACM Computing Surveys  
On the contrary, modern data models exacerbate the problem: In order to manipulate large sets of complex objects as efficiently as today's database systems manipulate simple records, query processing algorithms  ...  Database management systems will continue to manage large data volumes.  ...  Assuming the output cardinality (number of items) is G times less than the input cardinality (G = R/o), where G is called the aver. age group size or the reduction factor, only the last~logF(G)l merge  ... 
doi:10.1145/152610.152611 fatcat:auqfirimvjbsfo36s6guwgqc3a

Smoke: Fine-grained Lineage at Interactive Speed [article]

Fotis Psallidas, Eugene Wu
2018 arXiv   pre-print
goal to streamline queries over lineage.  ...  To this end, we introduce Smoke, an in-memory database engine that neither lineage capture overhead nor lineage query processing needs to be compromised.  ...  Also, if the join is a primarykey foreign-key join, the forward index of the foreign-key table is an rid array; since the join cardinality is the same as the foreign-key table cardinality, backward indexes  ... 
arXiv:1801.07237v1 fatcat:mgxaqqmfdrffdl4atpsfss72wu

Evita raced

Tyson Condie, David Chu, Joseph M. Hellerstein, Petros Maniatis
2008 Proceedings of the VLDB Endowment  
We demonstrate that a declarative language like OverLog is well-suited to expressing traditional and novel query optimizations as well as other query manipulations, in a compact and natural fashion.  ...  In this paper, we apply the lessons of declarative systems to the internals of a declarative engine.  ...  Acknowledgments Thanks to Goetz Graefe and Hamid Pirahesh for helpful insights and perspective, and to Kuang Chen for editorial feedback.  ... 
doi:10.14778/1453856.1453978 fatcat:4wx4le2zwrh3zd42mbbbkk22lq

Model and procedure for performance and availability-wise parallel warehouses

Pedro Furtado
2009 Distributed and parallel databases  
A query over such data may take a large amount of time to be processed in a regular PC.  ...  Nodes and network may even not be fully dedicated to the data warehouse.  ...  not result in excessive extra overheads, which is related in Figure 2 to the number and size of the partial results R(i) per chunk, and the size of R(i) in this case depends on the cardinality of the  ... 
doi:10.1007/s10619-009-7038-7 fatcat:xibt4s7ernfffdsfaiavsntc6m

DB2 Parallel Edition

C. K. Baru, G. Fecteau, A. Goyal, H. Hsiao, A. Jhingran, S. Padmanabhan, G. P. Copeland, W. G. Wilson
1995 IBM Systems Journal  
The rate of increase in database size and response-time requirements has outpaced advancements in processor and mass storage technology.  ...  Single-system (or serial) DBMSS can-, not handle the capacity and the complexity requirements of these applications.  ...  A fundamental technology in Figure 3B is the mechanism that glues the nodes together to provide a single-system view to the user.  ... 
doi:10.1147/sj.342.0292 fatcat:3mso35ij6bd6jk3sgrbashfrme

An architecture for compiling UDF-centric workflows

Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Carsten Binnig, Ugur Cetintemel, Stan Zdonik
2015 Proceedings of the VLDB Endowment  
While query compilation has gained widespread popularity as a way to tackle the computation bottleneck for traditional SQL workloads, relatively little work addresses UDF-centric workflows in the domain  ...  compared to alternative systems.  ...  We would like to thank our collaborators from the Parallel Computing Lab at Intel, especially Nadathur Satish, for their valuable input.  ... 
doi:10.14778/2824032.2824045 fatcat:xfbpeg6qgvdo7n7hgoxnyuhr54

The Stratosphere platform for big data analytics

Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters (+6 others)
2014 The VLDB journal  
Acknowledgments We would like to thank the Master students that worked on the Stratosphere project and implemented many components of the system: Thomas Bodner, Christoph Brücke, Erik Nijkamp, Max Heimel  ...  Therefore, the size of intermediate results must be estimated.  ...  ., physical join implementations) rather than user-defined functions wrapped in glue code that parallelizes execution.  ... 
doi:10.1007/s00778-014-0357-y fatcat:ficnpssbvjdatjs3gewa4jdd7m

Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs

Doruk Sart, Abdullah Mueen, Walid Najjar, Eamonn Keogh, Vit Niennattrakul
2010 2010 IEEE International Conference on Data Mining  
In this work we argue that we are now close to exhausting all possible speedup from software, and that we must turn to hardware-based solutions if we are to tackle the many problems that are currently  ...  Given DTW's usefulness and ubiquity, there has been a large community-wide effort to mitigate its relative lethargy.  ...  It makes use of 128-bit SSE registers and can merge four 32-bit data to operate concurrently.  ... 
doi:10.1109/icdm.2010.21 dblp:conf/icdm/SartMNKN10 fatcat:2s3emvze2ff7jhur75bmbx77ta

Declarative Data Analytics: a Survey [article]

Nantia Makrynioti Athens University of Economics, Business)
2019 arXiv   pre-print
The survey explores a wide range of declarative data analysis frameworks by examining both the programming model and the optimization techniques used, in order to provide conclusions on the current state  ...  For example, in a SQL query that involves both joining tables and multiplying matrices, the order of joins can play an important role on the size of the result of matrix multiplication.  ...  It could choose to perform matrix multiplication before joining, in order to reduce the size of the matrices moving up the plan, instead of performing a series a joins and leave matrix multiplication for  ... 
arXiv:1902.01304v1 fatcat:mixepfprkjc5xayhz76bwu3px4

Mapping Features to Models: A Template Approach Based on Superimposed Variants [chapter]

Krzysztof Czarnecki, Michał Antkiewicz
2005 Lecture Notes in Computer Science  
Mapping features to other models, such as behavioral or data specifications, gives them semantics.  ...  We show how the approach can be applied to UML 2.0 activity and class models and describe a prototype implementation.  ...  The group symbol indicates group cardinality 1k , where k is the group size. Thus available checkout types can be any non-empty subset of the two checkout types.  ... 
doi:10.1007/11561347_28 fatcat:dayqcz52szcmtgjt6covbfhm5m

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

Maaz Bin Safeer Ahmad, Alvin Cheung
2018 Proceedings of the 2018 International Conference on Management of Data - SIGMOD '18  
We evaluated Casper by automatically converting real-world, sequential Java benchmarks to MapReduce. The resulting benchmarks perform up to 48.2x faster compared to the original.  ...  Casper identifies potential code fragments to rewrite and translates them in two steps: (1) Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each code  ...  It also generates "glue" code to merge the generated code into the rest of the program.  ... 
doi:10.1145/3183713.3196891 dblp:conf/sigmod/AhmadC18 fatcat:lyqzd4awyna23esoairgfba6rq
« Previous Showing results 1 — 15 out of 193 results