Filters








10,211 Hits in 2.8 sec

Recursive Data Structures in SPARK [chapter]

Claire Dross, Johannes Kanig
2020 Lecture Notes in Computer Science  
In particular, we consider pointer-based recursive data structures, and discuss how they are supported in SPARK.  ...  To avoid introducing a memory model and to stay in the first-order logic background of SPARK, the relation between the iterator and the underlying structure is encoded as a predicate which is maintained  ...  [1] , that enables proofs about recursive pointer-based data structures in SPARK.  ... 
doi:10.1007/978-3-030-53291-8_11 fatcat:awiwdmdahfafvl7ksfksnlyj3i

RECURSIVE JOIN PROCESSING IN BIG DATA ENVIRONMENT

Anh-Cang Phan, Thanh-Ngoan Trieu, Thuong-Cang Phan
2021 Journal of Computer Science and Cybernetics  
In this study, we thus propose a simple but efficient approach for Big recursive joins based on reducing by half the number of the required iterations in the Spark environment.  ...  However, it also offers many challenges in the way data is processed and queried over time. A join operation is one of the most common operations appearing in many data queries.  ...  Section 2 presents the background related join operations in Spark environment. Section 3 provides our proposal to effectively process recursive join in Big Data environment.  ... 
doi:10.15625/1813-9663/37/2/15889 fatcat:6xost7gd55de7duppgnfnjucou

Big Data Analytics with Datalog Queries on Spark

Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, Carlo Zaniolo
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark.  ...  Developing efficient complex analytics in Spark requires deep understanding of both the algorithm at hand and the Spark API or subsystem APIs (e.g., Spark SQL, GraphX).  ...  We introduce recursion operators and data structures to efficiently implement the technique in Spark. • We propose physical planning and scheduler optimizations for recursive queries in Spark, including  ... 
doi:10.1145/2882903.2915229 pmid:28626296 pmcid:PMC5470845 dblp:conf/sigmod/ShkapskyYICCZ16 fatcat:fw2fje66wfaipfvhax5bi4mim4

BigSR: an empirical study of real-time expressive RDF stream reasoning on modern Big Data platforms [article]

Xiangnan Ren and Olivier Curé and Hubert Naacke and Guohui Xiao
2018 arXiv   pre-print
Accordingly, we implement BigSR on top of Apache Spark Streaming (BSP model) and Apache Flink (RAT model).  ...  Our experiments show that BigSR over both BSP and RAT generally scales up to high throughput beyond million-triples per second (with or without recursion), and RAT attains sub-millisecond delay for stateless  ...  Data Structure.  ... 
arXiv:1804.04367v1 fatcat:up2rgtt4bbc67ohpy3qwzsyn6y

A Theoretical and Experimental Comparison of Large-Scale Join Algorithms in Spark

Anh-Cang Phan, Thuong-Cang Phan, Thanh-Ngoan Trieu, Thi-To-Quyen Tran
2021 SN Computer Science  
Generally, the two-way and recursive joins using filters are the best choices while performing in the Spark environment.  ...  This research systematically presents a theoretical and experimental comparison of the prominent join algorithms in the Spark environment.  ...  Cost Model for Recursive Join The general cost model for recursive join in Spark is presented in Eq. 9.  ... 
doi:10.1007/s42979-021-00738-x fatcat:kqlktbs6inbyvj5stoabaammlu

Let High-level Graph Queries Be Parallel Efficient: An Approach Over Structural Recursion On Pregel

Chong Li, Le-Duc Tung, Xiaodong Meng, Zhenjiang Hu
2016 Journal of Information Processing  
Therefore, the complexity in developing efficient structural recursive functions is relaxed by our solution.  ...  However, most of the previous studies about graph structural recursion do not exploit in practical the power of parallel computing.  ...  Section 2 reviews our data model and the graph structural recursion. Section 3 shows how structural recursion can be evaluated efficiently in parallel.  ... 
doi:10.2197/ipsjjip.24.928 fatcat:vex7gd3il5g33iksygvhepzwqy

Scaling-up reasoning and advanced analytics on BigData

TYSON CONDIE, ARIYAM DAS, MATTEO INTERLANDI, ALEXANDER SHKAPSKY, MOHAN YANG, CARLO ZANIOLO
2018 Theory and Practice of Logic Programming  
The paper describes how (i) was addressed by simple rules under which the fixpoint semantics extends to programs using count, sum and extrema in recursion, and (ii) was tamed by parallel compilation techniques  ...  AbstractBigDatalog is an extension of Datalog that achieves performance and scalability on both Apache Spark and multicore systems to the point that its graph analytics outperform those written in GraphX  ...  This work was supported in part by NSF grants IIS-1218471, IIS-1302698 and CNS-1351047, and U54EB020404 awarded by NIH Big Data to Knowledge (BD2K).  ... 
doi:10.1017/s1471068418000418 fatcat:xvfcjy4fi5ctvpesstdhqrhsvq

Emma in Action

Alexander Alexandrov, Andreas Salzmann, Georgi Krastev, Asterios Katsifodimos, Volker Markl
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
In addition, Emma also advocates quoting the entire data analysis algorithm rather than its individual dataflow expressions.  ...  To retain a sufficient level of abstraction and lower the barrier of entry for data scientists, projects like Spark and Flink currently offer domain-specific APIs on top of their parallel collection abstractions  ...  We use the theory of recursive data types to model the parallel collection types -DataSet in Flink and RDD in Spark -which form the core of the targeted parallel dataflow engines.  ... 
doi:10.1145/2882903.2899396 dblp:conf/sigmod/AlexandrovSKKM16 fatcat:hre3crgnj5dp5mjiq22etivfpu

Spark-Based Large-Scale Matrix Inversion for Big Data Processing

Jun Liu, Yang Liang, Nirwan Ansari
2016 IEEE Access  
We present its well-designed implementation with optimized data structure, reduction of space complexity and effective matrix multiplication on the Spark parallel computing platform.  ...  In this paper, we present a LU decomposition-based block-recursive algorithm for large-scale matrix inversion.  ...  Data Structure Design on Spark In each round of the recursion, the Spark-based implementation of the proposed algorithm requires distributing the huge input matrix and the intermediate matrices across  ... 
doi:10.1109/access.2016.2546544 fatcat:npwy2xc4y5dqlbji2o4g2wniye

SPIN: A Fast and Scalable Matrix Inversion Method in Apache Spark [article]

Chandan Misra, Sourangshu Bhattacharya, Soumya K. Ghosh
2018 arXiv   pre-print
In this paper, we propose a different scheme based on Strassen's matrix inversion algorithm (mentioned in Strassen's original paper in 1969), which uses far fewer operations at each level of recursion.  ...  Existing methods for efficient and distributed matrix inversion using big data platforms rely on LU decomposition based block-recursive algorithms.  ...  All these steps are done by spli ing the matrix into blocks which act as execution unit of the spark job. A brief description of the block data structure is given below.  ... 
arXiv:1801.04723v1 fatcat:vktsswwminfg7as65nteusb2za

Controlling loops in parallel mercury code

Paul Bone, Zoltan Somogyi, Peter Schachte
2012 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming - DAMP '12  
This worked very well in many cases, but in cases of tail recursion, we got much lower speedups than we expected, due to excessive memory usage.  ...  Recently we built a system that uses profiling data to automatically parallelize Mercury programs by finding conjunctions with expensive conjuncts that can run in parallel with minimal synchronization  ...  The Mercury compiler implements (G1 & G2 & . . . & Gn) by creating a data structure representing a barrier, and then spawning off (G2 & . . . & Gn) as a spark.  ... 
doi:10.1145/2103736.2103739 dblp:conf/popl/BoneSS12 fatcat:doio7o4kbbdk5g2b4mnrlxepbu

Querying Semantic Knowledge Bases with SQL-on-Hadoop

Martin Przyjaciel-Zablocki, Alexander Schätzle, Georg Lausen
2017 Proceedings of the 4th Algorithms and Systems on MapReduce and Beyond - BeyondMR'17  
We present a new version of our TriAL-QL processor, which takes advantage of the current momentum in in-memory SQL-on-Hadoop solutions and is built on top of Impala and SPARK while using one unified data  ...  In this paper, we continue our work on TriAL-QL, an expressive (SQL-like) RDF query language based on the Triple Algebra with Recursion [31] .  ...  Moreover, we can even use a composite execution for recursive TriAL expressions by utilizing the fact that in Spark a user can embed SQL queries in a general Scala or Java program used as a driver.  ... 
doi:10.1145/3070607.3070610 dblp:conf/sigmod/Przyjaciel-Zablocki17 fatcat:wt42dwt6infpzn2ywq6n564vlu

Scalable Time-Decaying Adaptive Prediction Algorithm

Yinyan Tan, Zhe Fan, Guilin Li, Fangshan Wang, Zhengbing Li, Shikai Liu, Qiuling Pan, Eric P. Xing, Qirong Ho
2016 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16  
., Spark and Petuum.  ...  To scale Big Data, we further parallelize our algorithm following the data parallel scheme under both BSP and SSP consistency model.  ...  Data structure. We first present the difference on data structure.  ... 
doi:10.1145/2939672.2939714 dblp:conf/kdd/TanFLWLLPXH16 fatcat:xmeo3bfzxjbrpei4u47sgrrwti

Lifetime-Based Memory Management for Distributed Data Processing Systems [article]

Lu Lu, Xuanhua Shi, Yongluan Zhou, Xiong Zhang, Hai Jin, Cheng Pei, Ligang He, Yuanzhen Geng
2016 arXiv   pre-print
In-memory caching of intermediate data and eager combining of data in shuffle buffers have been shown to be very effective in minimizing the re-computation and I/O cost in distributed data processing systems  ...  like Spark and Flink.  ...  Table 6 : 6 Execution times of two exploratory SQL query in Spark, Spark SQL and Deca. In Query2, the size of swapped cache data in Spark is 23.1GB.  ... 
arXiv:1602.01959v3 fatcat:ds2xqnfi35d2rlt52gqaakey64

How to Win a Hot Dog Eating Contest

Milos Nikolic, Mohammad Dashti, Christoph Koch
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
In the quest for valuable information, modern big data applications continuously monitor streams of data.  ...  In this paper, we study low-latency incremental computation of complex SQL queries in both local and distributed streaming environments.  ...  DBToaster creates all relevant index structures. From our experience, most data structures produced during recursive compilation have only few indexes.  ... 
doi:10.1145/2882903.2915246 dblp:conf/sigmod/NikolicD016 fatcat:bnqsj45fh5fkzldf3rf6gx43ta
« Previous Showing results 1 — 15 out of 10,211 results