Filters








12,170 Hits in 5.4 sec

Mix 'n' match multi-engine analytics

Katerina Doka, Nikolaos Papailiou, Victor Giannakouris, Dimitrios Tsoumakos, Nectarios Koziris
2016 2016 IEEE International Conference on Big Data (Big Data)  
As a remedy, we present IReS, the Intelligent Resource Scheduler for complex analytics workflows executed over multi-engine environments.  ...  Its optimizer incurs only marginal overhead to the workflow execution performance, managing to discover the optimal execution plan within a few seconds, even for large-scale workflow instances.  ...  The central notion behind IReS is to utilize detailed models of the costs and performance characteristics of analytics operators over multiple execution engines.  ... 
doi:10.1109/bigdata.2016.7840605 dblp:conf/bigdataconf/DokaPGTK16 fatcat:nimpr3poqfhapgxssfoulvva6u

IReS

Katerina Doka, Nikolaos Papailiou, Dimitrios Tsoumakos, Christos Mantas, Nectarios Koziris
2015 Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data - SIGMOD '15  
To this end, we demonstrate IReS, the Intelligent Resource Scheduler for complex analytics workflows executed over multi-engine environments.  ...  IReS is then able to match distinct workflow parts to the execution and/or storage engine among the available ones in order to optimize with respect to a user-defined policy.  ...  The central notion behind the IReS platform is to create detailed models of the costs and performance characteristics of various analytics operations over multiple execution engines.  ... 
doi:10.1145/2723372.2735377 dblp:conf/sigmod/DokaPTMK15 fatcat:5dyeik3d45esje2mbqmwravm4q

SheerMP: Optimized Streaming Analytics-as-a-Service over Multi-site Multi-platform Settings

George Stamatakis, Antonios Kontaxakis, Alkis Simitsis, Nikos Giatrakos, Antonios Deligiannakis
2022 Zenodo  
In this paper, we demonstrate a prototype system that optimizes streaming analytics workflows across Big Data platforms and computer clusters.  ...  a wide variety of practical optimization and adaptive resource allocation scenarios over a variety of streaming Big Data platforms  ...  SheerMP automates optimization decisions, submits and migrates streaming analytics workflows, and monitors their execution over a variety of streaming Big Data platforms.  ... 
doi:10.5281/zenodo.6345356 fatcat:6wwzu3ijgfdzbfzb7grz25xlkm

The Case for Multi-Engine Data Analytics [chapter]

Dimitrios Tsoumakos, Christos Mantas
2014 Lecture Notes in Computer Science  
Such an environment further requires an intelligent management system for orchestrating and coordinating complex analytics tasks over the different available engines.  ...  In this paper we argue on the need of a multi-engine environment that will exploit the largely different models, cost and quality of the existing analytics engines.  ...  Modeling and Learning Engine In order for the scheduler to choose an optimized execution plan for an analytics task that will span (a) multiple execution engines and (b) multiple data stores, a detailed  ... 
doi:10.1007/978-3-642-54420-0_40 fatcat:4yd56fw575eptds4kohxkeqzqe

D4.1 Definition of Architecture for Extreme-Scale Analytics

Project Consortium Members
2019 Zenodo  
physical resources in a way that optimizes specific performance measures, (iii) providing real-time, interactive machine learning and data mining tools that can be leveraged by the designed workflows,  ...  to an omnibus solution for extreme-scale streaming analytics.  ...  Once a workflow is sent to the Optimizer, the Optimizer enumerates the space of possible and promising execution plans for the workflow and estimates plan costs using a dynamic cost model that predicts  ... 
doi:10.5281/zenodo.4034092 fatcat:g766jj6xwvesddsm3xs56l6mqq

Stubby: A Transformation-based Optimizer for MapReduce Workflows [article]

Harold Lim, Herodotos Herodotou, Shivnath Babu
2012 arXiv   pre-print
However, automatic cost-based optimization of MapReduce workflows remains a challenge due to the multitude of interfaces, large size of the execution plan space, and the frequent unavailability of all  ...  Studies have shown that the gap in performance can be quite large between optimized and unoptimized workflows.  ...  Since many analytical workflows are run periodically, the optimization overhead of Stubby can be amortized over multiple workflow runs.  ... 
arXiv:1208.0082v1 fatcat:2qzmt6psizbylogpnqhm7tsyrq

The Many Faces of Data-centric Workflow Optimization: A Survey [article]

Georgia Kougka, Anastasios Gounaris, Alkis Simitsis
2017 arXiv   pre-print
Firstly, to present the main dimensions of the relevant optimization problems and the types of optimizations that occur before flow execution.  ...  This survey focuses on data-centric workflows (or workflows for data analytics or data flows), where a key aspect is data passing through and getting manipulated by a sequence of steps.  ...  , such as the workflow monitoring and data provision components; iii) workflow execution plan (WEP) generation, where the workflow plan is optimized, e.g., through workflow refactoring and parallelization  ... 
arXiv:1701.07723v1 fatcat:fasmrggxfzb33ckcookphwdve4

Odyssey

Hakan Hacígümüş, Jagan Sankaranarayanan, Junichi Tatemura, Jeff LeFevre, Neoklis Polyzotis
2013 Proceedings of the VLDB Endowment  
Acknowledgment: We thank NEC's product and business teams for their generous support and contributions.  ...  The future phases of the system development plan to include additional execution engines, such as a columnar in-memory store.  ...  Currently, the system uses two execution engines, namely; Hadoop and the Relational DW.  ... 
doi:10.14778/2536222.2536249 fatcat:xtkgwtevx5cg5dmnyha3ate3di

Design and Development of an Adaptive Workflow-Enabled Spatial-Temporal Analytics Framework

Xiaorong Li, Rodrigo N. Calheiros, Sifei Lu, Long Wang, Henry Palit, Qin Zheng, Rajkumar Buyya
2012 2012 IEEE 18th International Conference on Parallel and Distributed Systems  
In this paper, we present the architecture of such a WfMS and evaluate it in terms of performance for execution of workflows in Clouds.  ...  Cloud computing is a suitable platform for execution of complex computational tasks and scientific simulations that are described in the form of workflows.  ...  Our proposed architecture is able to (i) share workflows from multiple users for analytics, (ii) harness a workflow management and scheduling engine for adaptive resource allocation and optimization, (  ... 
doi:10.1109/icpads.2012.141 dblp:conf/icpads/LiCLWPZB12 fatcat:b4hw676b6jalfo2fpet3lgnueu

Optimizing Resource Allocation for Scientific Workflows Using Advance Reservations [chapter]

Christoph Langguth, Heiko Schuldt
2010 Lecture Notes in Computer Science  
The recent interest in web services and service-oriented architectures has strongly facilitated the development of individual workflow activities as well as their composition and the distributed execution  ...  However, in many applications concurrent scientific workflows may be served by multiple competing providers, with each of them offering only limited resources.  ...  Finally, it is worth mentioning that the workflow is scheduled to be run by two different workflow engines: Activities 1 through 8 are executed by WF-A, then control (and data) is handed over to WF-B for  ... 
doi:10.1007/978-3-642-13818-8_30 fatcat:gr3ivljlsfhktgyjejdv77ogji

Bandwidth Optimization In Data Retrieval From Cloud Using Continuous Hive Language

S. Surendran, K. Prema
2016 International Journal Of Engineering And Computer Science  
The proposed system optimizes query execution plans and data replication to minimize bandwidth cost.  ...  Systems that compute SQL analytics over geographically distributed data operate by pulling all data to a central location.  ...  Query Deployment engine is responsible for deploying generated Optimized Query Plan (OQP) onto the processing nodes in the network topology.  ... 
doi:10.18535/ijecs/v5i6.20 fatcat:cpptv3anancorjumcjvxuqyoba

D5.1 Operator Cost Estimation and Workflow Optimisation Technology V1

Project Consortium Members
2020 Zenodo  
This deliverable presents techniques for optimizing workflow execution in terms of a set of optimization objectives (e.g., throughput, resource utilization) of extreme-scale analytics across different,  ...  It ingests statistics collected by the Manager Component to perform cost estimations and judge the performance of alternative execution plans i.e., the Optimizer Component transforms the logical workflow  ...  The goal of the optimizer is to receive the initial workflow drawn by the user, i.e., a logical workflow, and optimize its execution over multiple, networked clusters, Big Data platforms and admissible  ... 
doi:10.5281/zenodo.4034108 fatcat:t22h4qqgjfbsporpl4zkf5c2qm

Musketeer

Ionel Gog, Malte Schwarzkopf, Natacha Crooks, Matthew P. Grosvenor, Allen Clement, Steven Hand
2015 Proceedings of the Tenth European Conference on Computer Systems - EuroSys '15  
This is a direct consequence of the tight coupling between user-facing front-ends that express workflows (e.g., Hive, SparkSQL, Lindi, GraphLINQ) and the back-end execution engines that run them (e.g.,  ...  Musketeer speeds up realistic workflows by up to 9× by targeting different execution engines, without requiring any manual effort.  ...  However, some execution engines have limited expressivity and therefore require the data-flow DAG to be partitioned into multiple jobs.  ... 
doi:10.1145/2741948.2741968 dblp:conf/eurosys/GogSCGCH15 fatcat:j67jem3ohzef7pwh2ymldjwehy

A multiple-objective workflow scheduling framework for cloud data analytics

Orachun Udomkasemsub, Li Xiaorong, Tiranee Achalakul
2012 2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE)  
Our designed framework uses a meta-heuristics method, called Artificial Bee Colony (ABC), to create an optimized scheduling plan. The framework allows multiple constraints and objectives to be set.  ...  In this paper, we proposed a workflow scheduling framework that can efficiently schedule series workflows with multiple objectives onto a cloud system.  ...  The framework allows multiple objectives and constraints to be set in order to optimize the performance of data analytics workflow scheduling in cloud environments.  ... 
doi:10.1109/jcsse.2012.6261985 fatcat:2cb66vgxsvdyfnxliifonmlyya

Large-scale social-media analytics on stratosphere

Christoph Boden, Marcel Karnstedt, Miriam Fernandez, Volker Markl
2013 Proceedings of the 22nd International Conference on World Wide Web - WWW '13 Companion  
that eases the formulation of complete analytical workflows.  ...  Consequently, a wide range of analytics has been proposed to understand, steer, and exploit the mechanics and laws driving their functionality and creating the resulting benefits.  ...  Acknowledgements We thank Christoph Nagel and Stephan Pieper (now with http://www.surpreso.com) for their implementation support while at TU Berlin and the Stratosphere team.  ... 
doi:10.1145/2487788.2487916 dblp:conf/www/BodenKFM13 fatcat:oom64pvgtrbobfb4hygyvi2i4u
« Previous Showing results 1 — 15 out of 12,170 results