A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Labyrinth: Compiling Imperative Control Flow to Parallel Dataflows
[article]
2018
arXiv
pre-print
Parallel dataflow systems have become a standard technology for large-scale data analytics. ...
In this paper, we introduce Labyrinth, a method to compile programs written using imperative control flow constructs to a single dataflow job, which executes the whole program, including all iteration ...
Labyrinth Table 1 : Control flow handling approaches in parallel dataflow systems. ...
arXiv:1809.06845v3
fatcat:gphsfrpxavf4dgbt3i3472qv4m
A DATAFLOW MODEL FOR .NET-BASED GRID COMPUTING SYSTEMS
2007
GCA 2007
The dataflow engine dispatches the tasks onto candidate distributed computing resources in the system, and manages failures and load balancing problems in a transparent manner. ...
This paper presents the design, implementation and evaluation of a dataflow system, including a dataflow programming model and a dataflow engine, for coarse-grained distributed data intensive applications ...
(b) Matrix Vector Iterative Multiplication In this benchmark, one matrix is multiplied with one vector in an iterative manner. ...
doi:10.1142/9789812708823_0003
fatcat:4cn5kscwunddfk32qncz6n4gsy
Tagged Dataflow: a Formal Model for Iterative Map-Reduce
2014
International Conference on Extending Database Technology
In this paper, we consider the recent iterative extensions of the Map-Reduce framework and we argue that they would greatly benefit from the research work that was conducted in the area of dataflow computing ...
In particular, we suggest that the tagged-dataflow model of computation can be used as the formal framework behind existing and future iterative generalizations of Map-Reduce. ...
We strongly believe that the further investigation of the interactions between dataflow and the novel approaches to distributed processing that have resulted from Map-Reduce, will prove very rewarding. ...
dblp:conf/edbt/CharalambidisPR14
fatcat:c3oxui4qbvevniagirac2j3kxa
RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem
[article]
2021
arXiv
pre-print
In this paper, we re-examine the challenges posed by distributed RL and try to view it through the lens of an old idea: distributed dataflow. ...
We propose RLlib Flow, a hybrid actor-dataflow programming model for distributed RL, and validate its practicality by porting the full suite of algorithms in RLlib, a widely adopted distributed RL library ...
RLlib Flow consists of a set of dataflow operators that produce and consume distributed iterators [11] . ...
arXiv:2011.12719v4
fatcat:o7euvwohgrgtrazko3niasln4e
Apache Flink™: Stream and Batch Processing in a Single Engine
2015
IEEE Data Engineering Bulletin
It is becoming more and more apparent, however, that a huge number of today's large-scale data processing use cases handle data that is, in reality, produced continuously over time. ...
(machine learning, graph analysis) can be expressed and executed as pipelined fault-tolerant dataflows. ...
System Architecture In this section we lay out the architecture of Flink as a software stack and as a distributed system. ...
dblp:journals/debu/CarboneKEMHT15
fatcat:xzgvdr6pljctzb75xecvg74m3q
Supporting Reconfigurable Parallel Multimedia Applications
[chapter]
2006
Lecture Notes in Computer Science
We present Hinch, a runtime system for multimedia applications, that efficiently exploits parallelism by running the application in a dataflow style. ...
Programming multimedia applications for System-on-Chip (SoC) architectures is difficult because streaming communication, user event handling, reconfiguration, and parallelism have to be dealt with. ...
A '-' in this column means the system targets shared memory architectures. The events column indicates if the system has support for handling asynchronous user events. ...
doi:10.1007/11823285_80
fatcat:dqkaprt77fe5ze4bwklxooxify
Hierarchical Dataflow Model with Automated File Management for Engineering and Scientific Applications
2015
Procedia Computer Science
This paper introduces a workflow model targeted to provide natural automation and distributed execution of complex iterative computation processes, where the calculation chain contains multiple task-specific ...
Adopted approach to achieve the required level of automation is to use one of the many available scientific and engineering workflow systems, which can be based on different workflow models. ...
Hierarchical Dataflow Model with Automated File Management Nazarenko and Prokhorov ...
doi:10.1016/j.procs.2015.11.056
fatcat:wotdcbjx7jgh7gntkjdmm564om
A survey paper on logical perspective to manage BigData with incremental map reduce
2016
International Journal Of Engineering And Computer Science
That means incremental MapReduce processes big data in a less time and stores it in a more optimized form. ...
To improve the time of processing big data and optimizing data content of big data we applied PageRank and k-means iteratively along with MapReduce. ...
Dataflow Systems such as CIEL, Spark, Spark Streaming and Optimus extend acyclic batch dataflow to allow dynamic modification of the dataflow graph, and thus support iteration and incremental computation ...
doi:10.18535/ijecs/v5i11.40
fatcat:hndorr3lqrbl3hixocawzqbczu
Dynamic control flow in large-scale machine learning
2018
Proceedings of the Thirteenth EuroSys Conference on - EuroSys '18
We describe the design of the programming model, and its implementation in TensorFlow, a distributed machine learning system. ...
These applications benefit from the ability to make rapid control-flow decisions across a set of computing devices in a distributed system. ...
In particular, Skye Wanderman-Milne provided helpful comments on a draft of this paper. We also thank our shepherd, Peter Pietzuch, for his guidance in improving the paper. ...
doi:10.1145/3190508.3190551
dblp:conf/eurosys/YuABBBDDGHHIKMM18
fatcat:5u4gcsi5fba33mv2nyni32h424
Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation
[article]
2021
arXiv
pre-print
Distributed dataflow systems like Spark and Flink enable the use of clusters for scalable data analytics. ...
Yet, in many situations, dynamic scaling can be used to meet formulated runtime targets despite significant performance variance. ...
As the management of computational resources is often not directly handled by these distributed dataflow systems, they commonly make use of resource management systems such as YARN [3] or Kubernetes ...
arXiv:2108.12211v2
fatcat:towol5fwl5depo4v3wxjtrq4bm
Spinning fast iterative data flows
2012
Proceedings of the VLDB Endowment
Third, dataflows seem to be a well adopted abstraction for distributed algorithms, as shown by their increased popularity in the database and machine learning community [5, 35] . ...
In our experiments, the improved dataflow system is highly competitive with specialized systems while maintaining a transparent and unified dataflow abstraction. existing dataflow systems execute incremental ...
DATAFLOW SYSTEMS We have implemented full and incremental iterations in the Stratosphere dataflow system [7] . ...
doi:10.14778/2350229.2350245
fatcat:mlgvsxdlrngntdm33n2drq6h5m
Shared Arrangements: practical inter-query sharing for streaming dataflows
[article]
2020
arXiv
pre-print
Current systems for data-parallel, incremental processing and view maintenance over high-rate streams isolate the execution of independent queries. ...
We implement shared arrangements in a modern stream processor and show order-of-magnitude improvements in query response time and resource consumption for interactive queries against high-throughput streams ...
Iteration The iteration operator is essentially unchanged from Naiad's Differential Dataflow implementation. ...
arXiv:1812.02639v3
fatcat:7jvlhrceofahpikttojhcxsepq
A programming model for Hybrid Workflows: Combining task-based workflows and dataflows all-in-one
2020
Future generations computer systems
Also, it provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with ...
We propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using ...
Apache Storm [23] a is distributed real-time computation system based on the master-worker architecture and used in real-time analytics, online machine learning, continuous computation, and distributed ...
doi:10.1016/j.future.2020.07.007
fatcat:24a4z2fl6jgujkxp5vdeu4xo6m
Large-scale distributed L-BFGS
2017
Journal of Big Data
In this paper, we present a parallelized implementation of the L-BFGS algorithm on a distributed system which includes a cluster of commodity computing machines. ...
We use open source HPCC Systems (High-Performance Computing Cluster) platform as the underlying distributed system to implement the L-BFGS algorithm. ...
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Received: 24 April 2017 Accepted: 6 July 2017 ...
doi:10.1186/s40537-017-0084-5
fatcat:xf63nerkhzdgzdvli2iqmnwf2m
SWITCHES is a task-based dataflow runtime that implements a lightweight distributed triggering system for runtime dependence resolution and uses static scheduling and compile-time assignment policies to ...
Unlike other systems, the granularity of loop-tasks can be increased to favor data-locality, even when having dependences across different loops. ...
ACKNOWLEDGMENTS The authors would like to thank The Cyprus Institute and the Cy-Tera HPC Facility for providing the hardware resources (Intel Xeon Phi accelerators) used in the evaluation of this work. ...
doi:10.1145/3127068
fatcat:adv63rca7zc5xgfa3yals6h33q
« Previous
Showing results 1 — 15 out of 7,288 results