Filters








12,428 Hits in 6.6 sec

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark [article]

Grégory M. Essertel, Ruby Y. Tahboub, James M. Decker, Kevin J. Brown, Kunle Olukotun, Tiark Rompf
2017 arXiv   pre-print
State-of-the-art systems like Spark have added SQL front-ends and relational query optimization, which promise an increase in expressiveness and performance.  ...  We demonstrate order of magnitude speedups both for relational workloads such as TPC-H, as well as for a range of machine learning kernels that combine relational and iterative functional processing.  ...  ., SQL, Machine Learning, graphs and matrices), provides parallel patterns and generates code for heterogeneous targets.  ... 
arXiv:1703.08219v1 fatcat:3euaew55sbe5le72a6rxatxt7a

SQL optimization in a parallel processing database system

Nayem Rahman
2013 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)  
Accordingly, writing SQL for a parallel processing DBMS requires special attention to maintain parallel efficiency in DBMS resources usage such as CPU and I/O.  ...  The resource savings statistics based on several experiments show significant reduction of computing resources usage and improvement of parallel efficiency (PE) can be achieved by using different optimization  ...  The author also thanks Joan Schnitzer for an excellent editing job.  ... 
doi:10.1109/ccece.2013.6567832 dblp:conf/ccece/Rahman13 fatcat:xdjdojgqaremzodzlatvvvez7m

Open issues in parallel query optimization

Waqar Hasan, Daniela Florescu, Patrick Valduriez
1996 SIGMOD record  
We provide an overview of query processing in parallel database systems and discuss several open issues in the optimization of queries for parallel machines.  ...  Given a SQL query, parallel query optimization has the goal of finding a parallel plan that delivers the query result in minimal time.  ...  Parallel Query Execution A procedural plan for a SQL query is conventionally represented as an annotated query tree.  ... 
doi:10.1145/234889.234894 fatcat:kbvsxwxmxnhetg5adb47hx6bna

Shark: SQL and Rich Analytics at Scale [article]

Reynold Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, Ion Stoica
2012 arXiv   pre-print
This allows Shark to run SQL queries up to 100x faster than Apache Hive, and machine learning programs up to 100x faster than Hadoop.  ...  It leverages a novel distributed memory abstraction to provide a unified engine that can run SQL queries and sophisticated analytics functions (e.g., iterative machine learning) at scale, and efficiently  ...  It does so by breaking a SQL query into multiple small queries and sending them to parallel databases for execution.  ... 
arXiv:1211.6176v1 fatcat:cdpyu3sp3bd7rcdzaaci4juayi

Shark

Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, Ion Stoica
2013 Proceedings of the 2013 international conference on Management of data - SIGMOD '13  
This allows Shark to run SQL queries up to 100× faster than Apache Hive, and machine learning programs up to 100× faster than Hadoop.  ...  It leverages a novel distributed memory abstraction to provide a unified engine that can run SQL queries and sophisticated analytics functions (e.g., iterative machine learning) at scale, and efficiently  ...  It does so by breaking a SQL query into multiple small queries and sending them to parallel databases for execution.  ... 
doi:10.1145/2463676.2465288 dblp:conf/sigmod/XinRZFSS13 fatcat:qs4bvu7habd77g42mtm3m5sgoy

Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data

Grégory M. Essertel, Ruby Y. Tahboub, James M. Decker, Kevin J. Brown, Kunle Olukotun, Tiark Rompf
2018 USENIX Symposium on Operating Systems Design and Implementation  
(e.g., TensorFlow for machine learning).  ...  Spark has enabled a wide audience of users to process petabyte-scale workloads due to its flexibility and ease of use: users are able to mix SQL-style relational queries with Scala or Python code, and  ...  ., SQL, Machine Learning, graphs and matrices), provides parallel patterns and generates code for heterogeneous targets.  ... 
dblp:conf/osdi/EssertelTDBOR18 fatcat:3cg4hejnizaydixcx7j72u4ube

Flare & lantern

Grégory Essertel, Ruby Y. Tahboub, Fei Wang, James Decker, Tiark Rompf
2019 Proceedings of the VLDB Endowment  
We demonstrate an integration of Flare (an accelerator for Spark SQL), and Lantern (an accelerator for TensorFlow and PyTorch) that results in a highly optimized end-to-end compiled data path, switching  ...  between SQL and ML processing with negligible overhead.  ...  Moreover, SQL queries with ML UDFs are written in Spark SQL and optimized as part of Flare's evaluation [5] .  ... 
doi:10.14778/3352063.3352097 fatcat:oqx6mho5dbdfhcsu53an64ig6u

Big Data Analytics Integrating a Parallel Columnar DBMS and the R Language

Yiqun Zhang, Carlos Ordonez, Wellington Cabrera
2016 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)  
Recently, novel columnar DBMSs have shown to provide orders of magnitude improvement in SQL query processing speed, preserving the parallel speedup of rowbased parallel DBMSs.  ...  Our algorithms are based on a combination of SQL queries, user-defined functions (UDFs) and R calls, where SQL queries and UDFs compute data set summaries that are sent to R to compute models in RAM.  ...  Query Processing Optimizations From a query processing perspective, we carefully analyzed each query plan, identifying key optimizations for a columnar architecture.  ... 
doi:10.1109/ccgrid.2016.94 dblp:conf/ccgrid/0001OC16 fatcat:qtdk36zro5bjxologekd6u24va

Integration of large-scale data processing systems and traditional parallel database technology

Azza Abouzied, Daniel J. Abadi, Kamil Bajda-Pawlikowski, Avi Silberschatz
2019 Proceedings of the VLDB Endowment  
We built a prototype, HadoopDB, and demonstrated that it can deliver the high SQL query performance and efficiency of parallel database management systems while still providing the scalability, fault tolerance  ...  systems and parallel database technology.  ...  The model presents several inefficiencies for parallel structured query processing, such as: (1) Complex SQL queries can require a large number of operators.  ... 
doi:10.14778/3352063.3352145 fatcat:qnwfplmf3jgodaw7tsu3kwjsnq

A Common Runtime for High Performance Data Analysis

Shoumik Palkar, James J. Thomas, Anil Shanbhag, Malte Schwarzkopf, Saman P. Amarasinghe, Matei Zaharia
2017 Conference on Innovative Data Systems Research  
It then performs key data movement optimizations and generates efficient parallel code for the whole workflow.  ...  Weld uses a common intermediate representation to capture the structure of diverse dataparallel workloads, including SQL, machine learning and graph analytics.  ...  ACKNOWLEDGEMENTS Sam Madden contributed significantly to the development of this project.  ... 
dblp:conf/cidr/PalkarTSSAZ17 fatcat:sk2icgs7yrd7hi5iuivr2hrbbe

SCOPE

Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, Jingren Zhou
2008 Proceedings of the VLDB Endowment  
The language is designed for ease of use with no explicit parallelism, while being amenable to efficient parallel execution on large clusters. SCOPE borrows several features from SQL.  ...  In this paper, we present a new declarative and extensible scripting language, SCOPE (Structured Computations Optimized for Parallel Execution), targeted for this type of massive data analysis.  ...  for bravely suffering through early versions of SCOPE; Grace Zhang for regression testing; Achint Srivastava for contributions to the runtime; Daniel Dedu-Constantin for contributions to the design; Andrew  ... 
doi:10.14778/1454159.1454166 fatcat:54unhznaxzgqfcdcgg76jcf4qi

The Performance of SQL-on-Hadoop Systems - An Experimental Study

Xiongpai Qin, Yueguo Chen, Jun Chen, Shuai Li, Jiesi Liu, Huijie Zhang
2017 2017 IEEE International Congress on Big Data (BigData Congress)  
This leads to the quick emergence of dozens of SQL-on-Hadoop systems that try to support interactive SQL query processing to the data stored in HDFS.  ...  According to the results, we show that such systems can benefit more from applications of many parallel query processing techniques that have been widely studied in the traditional massively parallel processing  ...  Acknowledgments The work is partially supported by the Ministry of Science  ... 
doi:10.1109/bigdatacongress.2017.68 dblp:conf/bigdata/QinCCLLZ17 fatcat:hd5vz6rybjfdzdpa5lm37hzw3e

SQLMR : A Scalable Database Management System for Cloud Computing

Meng-Ju Hsieh, Chao-Rui Chang, Li-Yung Ho, Jan-Jan Wu, Pangfeng Liu
2011 2011 International Conference on Parallel Processing  
SQLMR complies SQL-like queries to a sequence of MapReduce jobs.  ...  MapReduce provides a framework for large data processing and is shown to be scalable and fault-tolerant on commondity machines.  ...  We would also like to thank the Academia Sinica Computing Center for providing computing and storage facilities.  ... 
doi:10.1109/icpp.2011.54 dblp:conf/icpp/HsiehCHWL11 fatcat:47zwoz2ringx5g7w3eopk6z6a4

Database systems research on data mining

Carlos Ordonez, Javier García-García
2010 Proceedings of the 2010 international conference on Management of data - SIGMOD '10  
We pay particular attention to SQL and MapReduce as two competing technologies for large scale processing. We conclude with a summary of solved major problems and open research issues.  ...  We focus on the computation of well-known multidimensional statistical and machine learning models.  ...  In SQL a relational operator is generally evaluated with data parallelism and query evaluation becomes a sequence of physical operators acting on tables.  ... 
doi:10.1145/1807167.1807335 dblp:conf/sigmod/OrdonezG10 fatcat:ppgnhhnzorhkpbrpqz7etrz6j4

Can we analyze big data inside a DBMS?

Carlos Ordonez
2013 Proceedings of the sixteenth international workshop on Data warehousing and OLAP - DOLAP '13  
On the other hand, for data analytics in a broad sense, there are plenty of non-DBMS tools including statistical languages, matrix packages, generic data mining programs and largescale parallel systems  ...  Thus it would seem a DBMS is not a good technology to analyze big data, going beyond SQL queries, acting just as a reliable and fast data repository.  ...  In row stores tables are horizontally partitioned into sets of rows for parallel processing.  ... 
doi:10.1145/2513190.2513198 dblp:conf/dolap/Ordonez13 fatcat:ejuvzywvqrd5jig6bfcmixc66m
« Previous Showing results 1 — 15 out of 12,428 results