A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
Lecture Notes in Computer Science
As the amount of available RDF data continues to increase steadily, there is growing interest in developing efficient methods for analyzing such data. While recent efforts have focused on developing efficient methods for traditional data processing, analytical processing which typically involves more complex queries has received much less attention. The use of cost effective parallelization techniques such as Google's Map-Reduce offer significant promise for achieving Web scale analytics.doi:10.1007/978-3-642-04930-9_45 fatcat:gedb5yzr7rcfhmxsvn7kuyo334
more »... r, currently available implementations are designed for simple data processing on structured data. In this paper, we present a language, RAPID, for scalable ad-hoc analytical processing of RDF data on Map-Reduce frameworks. It builds on Yahoo's Pig Latin by introducing primitives based on a specialized join operator, the MD-join, for expressing analytical tasks in a manner that is more amenable to parallel processing, as well as primitives for coping with semi-structured nature of RDF data. Experimental evaluation results demonstrate significant performance improvements for analytical processing of RDF data over existing Map-Reduce based techniques.
Scalable processing of Semantic Web queries has become a critical need given the rapid upward trend in availability of Semantic Web data. The MapReduce paradigm is emerging as a platform of choice for large scale data processing and analytics due to its ease of use, cost effectiveness, and potential for unlimited scaling. Processing queries on Semantic Web triple models is a challenge on the mainstream MapReduce platform called Apache Hadoop, and its extensions such as Pig and Hive. This isdoi:10.1145/2487788.2487917 dblp:conf/www/KimRA13 fatcat:4v2uckujt5grvjwvklibbqqili
more »... use such queries require numerous joins which leads to lengthy and expensive MapReduce workflows. Further, in this paradigm, cloud resources are acquired on demand and the traditional join optimization machinery such as statistics and indexes are often absent or not easily supported. In this demonstration, we will present RAPID+, an extended Apache Pig system that uses an algebraic approach for optimizing queries on RDF data models including queries involving inferencing. The basic idea is that by using logical and physical operators that are more natural to MapReduce processing, we can reinterpret such queries in a way that leads to more concise execution workflows and small intermediate data footprints that minimize disk I/Os and network transfer overhead. RAPID+ evaluates queries using the Nested TripleGroup Data Model and Algebra (NTGA). The demo will show a comparative evaluation of NTGA query plans vs. relational algebra-like query plans used by Apache Pig and Hive. MapReduce Job Compiler Logical-to-Physical Plan Translator ܳ Jena Rule Schema Closure ݏ݃݊݅ܽܯ ݎ݂ ݄ܵܿ݁݉ܽ ݏ݈ܾ݁ܽ݅ݎܸܽ Query ܳ ௧ሺ"ோ,ሻ Rule-based Rewriter ܳ ௌሺ"ோ,ሻ Schema ݁ܿ݊݁ݎ݂݁݊ܫ ݄ݐ݅ݓ ݀݁ݐܿ݅ݎݐݏ݁ݎ ܵܨܦܴ ݏ݈݁ݑݎ ۺۿ܀ۯ۾܁ ܚ܍ܛܚ܉ܘ Schema-aware Rewriter Logical Plan Optimizer ܳ Architecture of RAPID+ Pig Latin Plan Generator NTGA Plan Generator ܲܮ ܲܮ ܲܮ ைା ܲܮ ீ௨௬ Hadoop Cluster Output Figure 3: The architecture and Dataflow of RAPID+.
Lecture Notes in Computer Science
Existing MapReduce systems support relational style join operators which translate multi-join query plans into several Map-Reduce cycles. This leads to high I/O and communication costs due to the multiple data transfer steps between map and reduce phases. SPARQL graph pattern matching is dominated by join operations, and is unlikely to be efficiently processed using existing techniques. This cost is prohibitive for RDF graph pattern matching queries which typically involve several joindoi:10.1007/978-3-642-21064-8_4 fatcat:znr6unrnezdp3hnb22mugka5fq
more »... s. In this paper, we propose an approach for optimizing graph pattern matching by reinterpreting certain join tree structures as grouping operations. This enables a greater degree of parallelism in join processing resulting in more "bushy" like query execution plans with fewer Map-Reduce cycles. This approach requires that the intermediate results are managed as sets of groups of triples or TripleGroups. We therefore propose a data model and algebra -Nested TripleGroup Algebra for capturing and manipulating TripleGroups. The relationship with the traditional relational style algebra used in Apache Pig is discussed. A comparative performance evaluation of the traditional Pig approach and RAPID+ (Pig extended with NTGA) for graph pattern matching queries on the BSBM benchmark dataset is presented. Results show up to 60% performance improvement of our approach over traditional Pig for some tasks.
Lecture Notes in Computer Science
The recent big data movement resulted in a surge of activity on layering declarative languages on top of distributed computation platforms. In the Semantic Web realm, this surge of analytics languages was not reflected despite the significant growth in the available RDF data. Consequently, when analysing large RDF datasets, users are left with two main options: using SPARQL or using an existing non-RDF-specific big data language, both with its own limitations. The pure declarative nature ofdoi:10.1007/978-3-319-11964-9_10 fatcat:h64mywchezbqhilk4xh2r7elee
more »... QL and the high cost of evaluation can be limiting in some scenarios. On the other hand, existing big data languages are designed mainly for tabular data and, therefore, applying them to RDF data results in verbose, unreadable, and sometimes inefficient scripts. In this paper, we introduce SYRql, a dataflow language designed to process RDF data at a large scale. SYRql blends concepts from both SPARQL and existing big data languages. We formally define a closed algebra that underlies SYRql and discuss its properties and some unique optimisation opportunities this algebra provides. Furthermore, we describe an implementation that translates SYRql scripts into a series of MapReduce jobs and compare the performance to other big data processing languages.
Flexible exploration of large RDF datasets with unknown relationships can be enabled using 'unbound-property' graph pattern queries. Relational-style processing of such queries using normalized relations, results in redundant information in intermediate results due to the repetition of adjoining bound (fixed) properties. Such redundancy negatively impacts the disk I/O, network transfer costs, and the required disk space while processing RDF query workloads on MapReduce-based systems. This workdoi:10.1145/2487788.2487872 dblp:conf/www/RavindraA13 fatcat:35i2ttqk5rdslgapqzxiw5fmai
more »... roposes packing and lazy unpacking strategies to minimize the redundancy in intermediate results while processing unbound-property queries. In addition to keeping the results compact, this work evaluates RDF queries using the Nested TripleGroup Data Model and Algebra (NTGA) that enables shorter MapReduce execution workflows. Experimental results demonstrate the benefit of this work over RDF query processing using relational-style systems such as Apache Pig and Hive.
Scalable query processing relies on early and aggressive determination and pruning of query-irrelevant data. Besides the traditional space-pruning techniques such as indexing, type-based optimizations that exploit integrity constraints defined on the types can be used to rewrite queries into more efficient ones. However, such optimizations are only applicable in strongly-typed data and query models which make it a challenge for semi-structured models such as RDF. Consequently, developingdoi:10.1145/3038912.3052655 dblp:conf/www/KimRA17 fatcat:xxuuomvppbcb5jp72akvpzjkiu
more »... ues for enabling type-based query optimizations will contribute new insight to improving the scalability of RDF processing systems. In this paper, we address the challenge of type-based query optimization for RDF graph pattern queries. The approach comprises of (i) a novel type system for RDF data induced from data and ontologies and (ii) a query optimization and evaluation framework for evaluating graph pattern queries using type-based optimizations. An implementation of this approach integrated into Apache Pig is presented and evaluated. Comprehensive experiments conducted on real-world and synthetic benchmark datasets show that our approach is up to 500X faster than existing approaches.
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud - MDAC '10
In order to exploit the growing amount of RDF data in decisionmaking, there is an increasing demand for analytics-style processing of such data. RDF data is modeled as a labeled graph that represents a collection of binary relations (triples). In this context, analytical queries can be interpreted as consisting of three main constructs namely pattern matching, grouping and aggregation, and require several join operations to reassemble them into n-ary relations relevant to the given query,doi:10.1145/1779599.1779604 fatcat:tvk3s4hhhrazbo5gxn4i4pus44
more »... traditional OLAP systems where data is suitably organized. MapReduce-based parallel processing systems like Pig have gained success in processing scalable analytical workloads. However, these systems offer only relational algebra style operators which would require an iterative n-tuple reassembly process in which intermediate results need to be materialized. This leads to high I/O costs that negatively impacts performance. In this paper, we propose UDFs that (i) re-factor analytical processing on RDF graphs in a way that enables more parallelized processing (ii) perform a look-ahead processing to reduce the cost of subsequent operators in the query execution plan. These functions have been integrated into the Pig Latin function library and the experimental results show up to 50% improvement in execution times for certain classes of queries. An important impact of this work is that it could serve as the foundation for additional physical operators in systems such as Pig for more efficient graph processing. Processing of RDF data usually requires several joins and grouping operations, which cannot effectively be pushed to the database. Yet another approach optimizes multi-way joins  by providing strategies to efficiently partition and replicate the tuples of a relation on reducer processes in a way that minimizes the communication cost. This work is complementary to our approach, and by integrating the partitioning scheme into Pig, we can further improve the performance of join operations. The RDF community has also recently embraced the parallel data processing paradigm as described by the MapReduce model, and there have been efforts to perform scalable RDF reasoning  by materializing the closure of the related graph and hence perform efficient reasoning using the resultant ordering of inferring rules. There have been MapReduce-based approaches for pattern matching ,  by decomposing graphs into RDF molecules.
Proceedings of the second international workshop on Data intensive computing in the clouds - DataCloud-SC '11
Broadened adoption of the Linking Open Data tenets has led to a significant surge in the amount of Semantic Web data, particularly RDF data. This has positioned the issue of scalable data processing techniques for RDF as a central issue in the Semantic Web research community. The RDF data model is a fine grained model representing relationships as binary relations. Thus, answering queries (typically graph pattern matching queries) over RDF data requires several join operations to reassembledoi:10.1145/2087522.2087527 fatcat:fjacm3udwrgrvckodltcszzfsi
more »... ted data. While MapReduce based processing is emerging as the de facto paradigm for processing large scale data, it is known to be inefficient for join-intensive workloads. In addition, most of the existing techniques for optimizing RDF data processing do not transfer well to the MapReduce model and often require significant lead time for pre-processing. Such a requirement may not be desirable for on-demand cloud database scenarios where the goal is to reduce the Time-To-Result (TTR). In this position paper, we argue that some of these challenges can be overcome by rethinking the operators for graph pattern processing, as well as adopting dynamic optimization techniques that exploit information from the previous execution steps in the current execution steps. We present some preliminary evaluation results of the proposed techniques.
REFERENCES DHIRAJ RAVINDRA 1 , KHATRI SUBHASH M 2 AUTHOR DETAILS Received: 18 th May 2016 Revised: 5 th June 2016 Accepted: 27 th June 2016 The Interventional study was done at Physiotherapy outpatient ... between 20 -60 years Exclusion criteria: Patients taking Sample size: fifty sample size Sampling method: The sampling method was convenient sampling Methodology: Author details: 1 Associate Professor, Padmashree ...doi:10.5281/zenodo.2538472 fatcat:6q5se2y7tjblljl4stjep5p7ga
2014 IEEE 30th International Conference on Data Engineering Workshops
Ravindra) Predictive Query Processing On Moving Objects (Abdeltawab M. ... Trip Similarity Computation for Context-Aware Travel Recommendation Exploiting Geotagged Photos (Zhenxing Xu) Session 2: Query Processing 335 Towards Optimization of RDF Analytical Queries on MapReduce (Padmashree ...doi:10.1109/icdew.2014.6818287 fatcat:znsnyupxufc47jk5eyej5udm54
2014 IEEE 30th International Conference on Data Engineering Workshops
, Padmashree 335 Towards Optimization of RDF Analytical Queries on MapReduce Ritter, Norbert 215 Orestes: A Scalable Database-as-a-Service Architecture for Low Latency Rivero, Carlos R. 20 ... Rahm, Erhard 4 BIIIG: Enabling Business Intelligence with Integrated Instance Graphs Ramirez, Paul 134 PDS4: A Model-Driven Planetary Science Data Architecture for Long-Term Preservation Ravindra ...doi:10.1109/icdew.2014.6818289 fatcat:pgittr2frneyfpxu74z7xkyium
All the study documents were approved by Institute Ethics Committee (IEC) of Padmashree Dr. D. Y. ... Prasad S Kulkarni, Rajeev M Dhere, Bhagwat Gunale, Vivek Vaidya, Ravindra Mulay, Amol Chaudhari are employed by Serum Institute of India Ltd. ...doi:10.1016/j.vaccine.2014.09.071 pmid:25446830 fatcat:7jtyyqqwmzgfrddljxhsegvksq
2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI)
Hosahalli Lakshmaiah (Mangalore University, India) 449 Hardware Design and Implementation of Pseudorandom Number Generator Using Piecewise Linear Chaotic Map Abhijeet Thane (University Of Mumbai, India), Ravindra ... Raushan (IIIT Guwahati, India) 179 Blend of Cloud and Internet of Things (IoT) in agriculture sector using light weight protocol Meenaxi Mahabaleshwar Raikar (K L E Technological University, India), Padmashree ...doi:10.1109/icacci.2018.8554765 fatcat:w77tqf7bjzda5dyztz46rh5eke
India (GMERS Medical College and Government Hospital, Sola, Ahmedabad; Bharati Hospital and Research Centre, Bharati Vidyapeeth, Pune; GMERS Medical College and Government Hospital, Gotri, Vadodara; Padmashree ...doi:10.7860/jcdr/2018/29855.11125 fatcat:5evbsfkoh5czvht65uw2dpbyo4
« Previous Showing results 1 — 15 out of 21 results