Filters








50 Hits in 5.6 sec

Massively parallel data analysis with PACTs on Nephele

Alexander Alexandrov, Max Heimel, Volker Markl, Dominic Battré, Fabian Hueske, Erik Nijkamp, Stephan Ewen, Odej Kao, Daniel Warneke
2010 Proceedings of the VLDB Endowment  
CONCLUSIONS We will demonstrate the Nephele/PACTs query processor, a system for massively parallel data processing based on the concept of Parallelization Contracts.  ...  INTRODUCTION Large-scale data analysis applications require processing and analyzing of Terabytes or even Petabytes of data, particularly in the areas of web analysis or scientific data management.  ... 
doi:10.14778/1920841.1921056 fatcat:qkb3dlwwmrdbnggys7pvykbk4y

Applying Stratosphere For Big Data Analytics

Marcus, Jochen, Moritz, Astrid, Volker
2013 Zenodo  
Moreover, these ideally declarative query specifications have to be optimized, parallelized and scheduled for processing on massively parallel data processing platforms.  ...  These examples include data cleansing and information extraction tasks, and a correlation analysis of microblogging and stock trade volume data that we describe in detail in this paper.  ...  Pact programs are optimized and compiled into data flowgraphs, which are processed in parallel by the Nephele execution engine.  ... 
doi:10.5281/zenodo.1210857 fatcat:bmpnayiy7ndn5gn3vgoc2mpjx4

Nephele/PACTs

Dominic Battré, Stephan Ewen, Fabian Hueske, Odej Kao, Volker Markl, Daniel Warneke
2010 Proceedings of the 1st ACM symposium on Cloud computing - SoCC '10  
We present a parallel data processor centered around a programming model of so called Parallelization Contracts (PACTs) and the scalable parallel execution engine Nephele [18] .  ...  We describe methods to transform a PACT program into a data flow for Nephele, which executes its sequential building blocks in parallel and deals with communication, synchronization and fault tolerance  ...  We also thank Guy Lohman for suggesting the name "PACT" for the contracts, as well as the anonymous reviewers for their constructive comments and suggestions.  ... 
doi:10.1145/1807128.1807148 dblp:conf/cloud/BattreEHKMW10 fatcat:7u6wfwtwjjaslc7nd5mdahlkjq

Efficient and Parallel Data Processing and Resource Allocation in the Cloud by using Nephele's Data Processing Framework

V. Saranya, S. Ramya, R.G. Suresh Kumar, T. Nalini
2016 International Journal of Grid and Distributed Computing  
A performance comparison with the well known data processing framework hadoop has been done.  ...  In this paper, we introduced Nephele, a data processing framework to exploit dynamic resource provisioning offered by IaaS clouds.  ...  [6] a parallel data processor centered around a programming model called Parallelization Contracts (PACTs) and parallel execution engine Nephele has been introduced.  ... 
doi:10.14257/ijgdc.2016.9.3.05 fatcat:2o72jgi42feqpn4lvavockif5e

Large-scale social-media analytics on stratosphere

Christoph Boden, Marcel Karnstedt, Miriam Fernandez, Volker Markl
2013 Proceedings of the 22nd International Conference on World Wide Web - WWW '13 Companion  
Based on the popular example of role analysis, we present and illustrate how this massively parallel approach can be leveraged to scale out complex data-mining tasks, while providing a programming approach  ...  In this work, we propose and demonstrate the usage of the massively parallel data processing system Stratosphere, based on second order functions as an extended notion of the MapReduce paradigm, to provide  ...  Acknowledgements We thank Christoph Nagel and Stephan Pieper (now with http://www.surpreso.com) for their implementation support while at TU Berlin and the Stratosphere team.  ... 
doi:10.1145/2487788.2487916 dblp:conf/www/BodenKFM13 fatcat:oom64pvgtrbobfb4hygyvi2i4u

Peeking into the optimization of data flow programs with MapReduce-style UDFs

F. Hueske, M. Peters, A. Krettek, M. Ringwald, K. Tzoumas, V. Markl, J. Freytag
2013 2013 IEEE 29th International Conference on Data Engineering (ICDE)  
candidate data flows, the generation of physical execution plans, and their parallel execution.  ...  We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language.  ...  Such tasks are commonly executed on massively parallel systems, as indicated by the popularity of higher-level languages for structured data analysis [4, 3, 5] .  ... 
doi:10.1109/icde.2013.6544927 dblp:conf/icde/HueskePKRTMF13 fatcat:bggujf4khvdlpbofkqewvjk6ye

The Stratosphere platform for big data analytics

Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters (+6 others)
2014 The VLDB journal  
We present Stratosphere, an open-source software stack for parallel data analysis.  ...  Acknowledgments We would like to thank the Master students that worked on the Stratosphere project and implemented many components of the system: Thomas Bodner, Christoph Brücke, Erik Nijkamp, Max Heimel  ...  The output of the PACT compiler is a parallel data flow program for Nephele, Stratosphere's parallel execution engine, and the third layer of the Stratosphere stack.  ... 
doi:10.1007/s00778-014-0357-y fatcat:ficnpssbvjdatjs3gewa4jdd7m

MapReduce and PACT - Comparing Data Parallel Programming Models

Alexander Alexandrov, Stephan Ewen, Max Heimel, Fabian Hueske, Odej Kao, Volker Markl, Erik Nijkamp, Daniel Warneke
2011 Datenbanksysteme für Business, Technologie und Web  
Next to parallel databases, new flavors of parallel data processors have recently emerged. One of the most discussed approaches is MapReduce.  ...  By the virtue of that programming model, the system can also apply a series of optimizations on the data flows before they are executed by the Nephele runtime system.  ...  The choice of the execution strategy is made by the PACT compiler, which translates PACT programs to parallel schedules for the Nephele runtime.  ... 
dblp:conf/btw/AlexandrovEHHKMNW09 fatcat:ifcvy2bhabes5otlwrgjdzp2y4

Iterative parallel data processing with stratosphere

Stephan Ewen, Sebastian Schelter, Kostas Tzoumas, Daniel Warneke, Volker Markl
2013 Proceedings of the 2013 international conference on Management of data - SIGMOD '13  
With increasing interest to run those algorithms on very large data sets, we see a need for new techniques to execute iterations in a massively parallel fashion.  ...  Iterative algorithms occur in many domains of data analysis, such as machine learning or graph analysis.  ...  Acknowledgments This research is funded by the German Research Foundation under grant "FOR 1036: Stratosphere -Information Management on the Cloud" and the European Union (EU) grant no. 257859 (project  ... 
doi:10.1145/2463676.2463693 dblp:conf/sigmod/EwenSTWM13 fatcat:ldthktqvwnfrbludeek7xak2ia

Nephele streaming: stream processing under QoS constraints at scale

Björn Lohrmann, Daniel Warneke, Odej Kao
2013 Cluster Computing  
As a proof of concept, we implemented our approach for our massively-parallel data processing framework Nephele and evaluated its effectiveness through a comparison with Hadoop Online.  ...  At the same time, massively-parallel data processing systems like MapReduce or Dryad currently enjoy a tremendous popularity for data-intensive applications and have proven to scale to large numbers of  ...  As a proof of concept, we implemented this extension as part of our massively-parallel data processing framework Nephele [28] , which runs data analysis jobs based on DAGs.  ... 
doi:10.1007/s10586-013-0281-8 fatcat:ug24ib66b5dzvkchncphbpvsam

Inside "Big Data management"

Vinayak Borkar, Michael J. Carey, Chen Li
2012 Proceedings of the 15th International Conference on Extending Database Technology - EDBT '12  
this space for a number of years and are currently working together on "Big Data" problems.  ...  challenges posed by today's notion of "Big Data".  ...  for their collaborative support and for their provision of access to one of their research clusters.  ... 
doi:10.1145/2247596.2247598 dblp:conf/edbt/BorkarCL12 fatcat:3vkbs5k5kzfwlbvykahgcec44q

A Survey on Vertical and Horizontal Scaling Platforms for Big Data Analytics

Ahmed Hussein Ali, ICCI, Informatics Institute for Postgraduate Studies, Baghdad, IRAQ, Mahmood Zaki Abdullah, Department of Computer Engineering, Al-Mustansiriyah University, Baghdad, IRAQ
2019 International Journal of Integrated Engineering  
Nephele/PACT This is a parallel system for data processing which is made up of a programming platform known as parallelization contracts, and a scalable engine for parallel execution known as Nephele  ...  processing Horizontal Spark 2014 Parallel computing platform Horizontal SAMOA 2013 Machine learning platform for streaming data Horizontal Nephele/PACT 2010 Parallel system for data processing  ... 
doi:10.30880/ijie.2019.11.06.015 fatcat:qbtbeq6ukbe5pmpmgkld3r33fe

Parallel data processing with MapReduce

Kyong-Ha Lee, Yoon-Joon Lee, Hyunsik Choi, Yon Dohn Chung, Bongki Moon
2012 SIGMOD record  
We also discuss the open issues and challenges raised on parallel data analysis with MapReduce.  ...  While MapReduce is used in many areas where massive data analysis is required, there are still debates on its performance, efficiency per node, and simple abstraction.  ...  INTRODUCTION In this age of data explosion, parallel processing is essential to processing a massive volume of data in a timely manner.  ... 
doi:10.1145/2094114.2094118 fatcat:kuvfuwss3fcmbf2d7oqqfibmoq

SPARQL2Flink: Evaluation of SPARQL Queries on Apache Flink

Oscar Ceballos, Carlos Alberto Ramírez Restrepo, María Constanza Pabón, Andres M. Castillo, Oscar Corcho
2021 Applied Sciences  
Existing SPARQL query engines and triple stores are continuously improved to handle more massive datasets.  ...  In this paper, we present a formal interpretation of some PACT transformations implemented in the Apache Flink DataSet API.  ...  Acknowledgments: The scalability test results on local cluster presented in this paper were obtained thanks to ViveLab Nariño, an initiative of Ministerio de Tecnologías de la Información y las Comunicaciones-MinTIC  ... 
doi:10.3390/app11157033 fatcat:kqtyvqp645bctbpriwhwb5qgxu

Map-Reduce Implementations: Survey and Performance Comparison

Zeba Khanam, Shafali Agarwal
2015 International Journal of Computer Science & Information Technology (IJCSIT)  
Map Reduce is used in different applications such as data mining, data analytics where massive data analysis is required, but still it is constantly being explored on different parameters such as performance  ...  Map Reduce has gained remarkable significance as a prominent parallel data processing tool in the research community, academia and industry with the spurt in volume of data that is to be analyzed.  ...  Earlier the solution was to engage the parallel database systems to deal with such massive amounts of data. But the drawback is that usually such database systems run on expensive high-end servers.  ... 
doi:10.5121/ijcsit.2015.7410 fatcat:egex75g65rg3nbialubagvllpi
« Previous Showing results 1 — 15 out of 50 results