7,288 Hits in 3.3 sec

Labyrinth: Compiling Imperative Control Flow to Parallel Dataflows [article]

Gábor E. Gévay, Tilmann Rabl, Sebastian Breß, Loránd Madai-Tahy, Volker Markl
2018 arXiv   pre-print
Parallel dataflow systems have become a standard technology for large-scale data analytics.  ...  In this paper, we introduce Labyrinth, a method to compile programs written using imperative control flow constructs to a single dataflow job, which executes the whole program, including all iteration  ...  Labyrinth Table 1 : Control flow handling approaches in parallel dataflow systems.  ... 
arXiv:1809.06845v3 fatcat:gphsfrpxavf4dgbt3i3472qv4m


2007 GCA 2007  
The dataflow engine dispatches the tasks onto candidate distributed computing resources in the system, and manages failures and load balancing problems in a transparent manner.  ...  This paper presents the design, implementation and evaluation of a dataflow system, including a dataflow programming model and a dataflow engine, for coarse-grained distributed data intensive applications  ...  (b) Matrix Vector Iterative Multiplication In this benchmark, one matrix is multiplied with one vector in an iterative manner.  ... 
doi:10.1142/9789812708823_0003 fatcat:4cn5kscwunddfk32qncz6n4gsy

Tagged Dataflow: a Formal Model for Iterative Map-Reduce

Angelos Charalambidis, Nikolaos Papaspyrou, Panos Rondogiannis
2014 International Conference on Extending Database Technology  
In this paper, we consider the recent iterative extensions of the Map-Reduce framework and we argue that they would greatly benefit from the research work that was conducted in the area of dataflow computing  ...  In particular, we suggest that the tagged-dataflow model of computation can be used as the formal framework behind existing and future iterative generalizations of Map-Reduce.  ...  We strongly believe that the further investigation of the interactions between dataflow and the novel approaches to distributed processing that have resulted from Map-Reduce, will prove very rewarding.  ... 
dblp:conf/edbt/CharalambidisPR14 fatcat:c3oxui4qbvevniagirac2j3kxa

RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem [article]

Eric Liang, Zhanghao Wu, Michael Luo, Sven Mika, Joseph E. Gonzalez, Ion Stoica
2021 arXiv   pre-print
In this paper, we re-examine the challenges posed by distributed RL and try to view it through the lens of an old idea: distributed dataflow.  ...  We propose RLlib Flow, a hybrid actor-dataflow programming model for distributed RL, and validate its practicality by porting the full suite of algorithms in RLlib, a widely adopted distributed RL library  ...  RLlib Flow consists of a set of dataflow operators that produce and consume distributed iterators [11] .  ... 
arXiv:2011.12719v4 fatcat:o7euvwohgrgtrazko3niasln4e

Apache Flink™: Stream and Batch Processing in a Single Engine

Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, Kostas Tzoumas
2015 IEEE Data Engineering Bulletin  
It is becoming more and more apparent, however, that a huge number of today's large-scale data processing use cases handle data that is, in reality, produced continuously over time.  ...  (machine learning, graph analysis) can be expressed and executed as pipelined fault-tolerant dataflows.  ...  System Architecture In this section we lay out the architecture of Flink as a software stack and as a distributed system.  ... 
dblp:journals/debu/CarboneKEMHT15 fatcat:xzgvdr6pljctzb75xecvg74m3q

Supporting Reconfigurable Parallel Multimedia Applications [chapter]

Maik Nijhuis, Herbert Bos, Henri E. Bal
2006 Lecture Notes in Computer Science  
We present Hinch, a runtime system for multimedia applications, that efficiently exploits parallelism by running the application in a dataflow style.  ...  Programming multimedia applications for System-on-Chip (SoC) architectures is difficult because streaming communication, user event handling, reconfiguration, and parallelism have to be dealt with.  ...  A '-' in this column means the system targets shared memory architectures. The events column indicates if the system has support for handling asynchronous user events.  ... 
doi:10.1007/11823285_80 fatcat:dqkaprt77fe5ze4bwklxooxify

Hierarchical Dataflow Model with Automated File Management for Engineering and Scientific Applications

Alexey M. Nazarenko, Alexander A. Prokhorov
2015 Procedia Computer Science  
This paper introduces a workflow model targeted to provide natural automation and distributed execution of complex iterative computation processes, where the calculation chain contains multiple task-specific  ...  Adopted approach to achieve the required level of automation is to use one of the many available scientific and engineering workflow systems, which can be based on different workflow models.  ...  Hierarchical Dataflow Model with Automated File Management Nazarenko and Prokhorov  ... 
doi:10.1016/j.procs.2015.11.056 fatcat:wotdcbjx7jgh7gntkjdmm564om

A survey paper on logical perspective to manage BigData with incremental map reduce

Snehal Dhamelia
2016 International Journal Of Engineering And Computer Science  
That means incremental MapReduce processes big data in a less time and stores it in a more optimized form.  ...  To improve the time of processing big data and optimizing data content of big data we applied PageRank and k-means iteratively along with MapReduce.  ...  Dataflow Systems such as CIEL, Spark, Spark Streaming and Optimus extend acyclic batch dataflow to allow dynamic modification of the dataflow graph, and thus support iteration and incremental computation  ... 
doi:10.18535/ijecs/v5i11.40 fatcat:hndorr3lqrbl3hixocawzqbczu

Dynamic control flow in large-scale machine learning

Yuan Yu, Peter Hawkins, Michael Isard, Manjunath Kudlur, Rajat Monga, Derek Murray, Xiaoqiang Zheng, Martín Abadi, Paul Barham, Eugene Brevdo, Mike Burrows, Andy Davis (+3 others)
2018 Proceedings of the Thirteenth EuroSys Conference on - EuroSys '18  
We describe the design of the programming model, and its implementation in TensorFlow, a distributed machine learning system.  ...  These applications benefit from the ability to make rapid control-flow decisions across a set of computing devices in a distributed system.  ...  In particular, Skye Wanderman-Milne provided helpful comments on a draft of this paper. We also thank our shepherd, Peter Pietzuch, for his guidance in improving the paper.  ... 
doi:10.1145/3190508.3190551 dblp:conf/eurosys/YuABBBDDGHHIKMM18 fatcat:5u4gcsi5fba33mv2nyni32h424

Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation [article]

Dominik Scheinert, Houkun Zhu, Lauritz Thamsen, Morgan K. Geldenhuys, Jonathan Will, Alexander Acker, Odej Kao
2021 arXiv   pre-print
Distributed dataflow systems like Spark and Flink enable the use of clusters for scalable data analytics.  ...  Yet, in many situations, dynamic scaling can be used to meet formulated runtime targets despite significant performance variance.  ...  As the management of computational resources is often not directly handled by these distributed dataflow systems, they commonly make use of resource management systems such as YARN [3] or Kubernetes  ... 
arXiv:2108.12211v2 fatcat:towol5fwl5depo4v3wxjtrq4bm

Spinning fast iterative data flows

Stephan Ewen, Kostas Tzoumas, Moritz Kaufmann, Volker Markl
2012 Proceedings of the VLDB Endowment  
Third, dataflows seem to be a well adopted abstraction for distributed algorithms, as shown by their increased popularity in the database and machine learning community [5, 35] .  ...  In our experiments, the improved dataflow system is highly competitive with specialized systems while maintaining a transparent and unified dataflow abstraction. existing dataflow systems execute incremental  ...  DATAFLOW SYSTEMS We have implemented full and incremental iterations in the Stratosphere dataflow system [7] .  ... 
doi:10.14778/2350229.2350245 fatcat:mlgvsxdlrngntdm33n2drq6h5m

Shared Arrangements: practical inter-query sharing for streaming dataflows [article]

Frank McSherry and Andrea Lattuada and Malte Schwarzkopf and Timothy Roscoe
2020 arXiv   pre-print
Current systems for data-parallel, incremental processing and view maintenance over high-rate streams isolate the execution of independent queries.  ...  We implement shared arrangements in a modern stream processor and show order-of-magnitude improvements in query response time and resource consumption for interactive queries against high-throughput streams  ...  Iteration The iteration operator is essentially unchanged from Naiad's Differential Dataflow implementation.  ... 
arXiv:1812.02639v3 fatcat:7jvlhrceofahpikttojhcxsepq

A programming model for Hybrid Workflows: Combining task-based workflows and dataflows all-in-one

Cristian Ramon-Cortes, Francesc Lordan, Jorge Ejarque, Rosa M. Badia
2020 Future generations computer systems  
Also, it provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with  ...  We propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using  ...  Apache Storm [23] a is distributed real-time computation system based on the master-worker architecture and used in real-time analytics, online machine learning, continuous computation, and distributed  ... 
doi:10.1016/j.future.2020.07.007 fatcat:24a4z2fl6jgujkxp5vdeu4xo6m

Large-scale distributed L-BFGS

Maryam M. Najafabadi, Taghi M. Khoshgoftaar, Flavio Villanustre, John Holt
2017 Journal of Big Data  
In this paper, we present a parallelized implementation of the L-BFGS algorithm on a distributed system which includes a cluster of commodity computing machines.  ...  We use open source HPCC Systems (High-Performance Computing Cluster) platform as the underlying distributed system to implement the L-BFGS algorithm.  ...  Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Received: 24 April 2017 Accepted: 6 July 2017  ... 
doi:10.1186/s40537-017-0084-5 fatcat:xf63nerkhzdgzdvli2iqmnwf2m


Andreas Diavastos, Pedro Trancoso
2017 ACM Transactions on Architecture and Code Optimization (TACO)  
SWITCHES is a task-based dataflow runtime that implements a lightweight distributed triggering system for runtime dependence resolution and uses static scheduling and compile-time assignment policies to  ...  Unlike other systems, the granularity of loop-tasks can be increased to favor data-locality, even when having dependences across different loops.  ...  ACKNOWLEDGMENTS The authors would like to thank The Cyprus Institute and the Cy-Tera HPC Facility for providing the hardware resources (Intel Xeon Phi accelerators) used in the evaluation of this work.  ... 
doi:10.1145/3127068 fatcat:adv63rca7zc5xgfa3yals6h33q
« Previous Showing results 1 — 15 out of 7,288 results