3,434 Hits in 2.0 sec

Loading databases using dataflow parallelism

Tom Barclay, Robert Barnes, Jim Gray, Prakash Sundaresan
1994 SIGMOD record  
This paper describes a parallel database load prototype for Digital's Rdb database product. The prototype takes a dataflow approach to database parallelism.  ...  Abstract: This paper describes a parallel database load prototype for Digital's Rdb database product. The prototype takes a dataflow approach to database parallelism.  ...  Summary and Conclusions Dataflow parallelism is the most promising approach to parallelize database operations. The prototype we built automates much of the parallel database loading task.  ... 
doi:10.1145/190627.190647 fatcat:zzwk7s3b7zdyjiqn7y5nvxlohi

Analyzing related raw data files through dataflows

Vítor Silva, Daniel de Oliveira, Patrick Valduriez, Marta Mattoso
2015 Concurrency and Computation  
When the SWfMS is dataflow-aware, it can register provenance data and the relationships among elements of raw data files altogether in a database which is useful to access the contents of a large number  ...  Database Management Systems (DBMS) are not suited for this, because they require loading the raw data and structuring it, which gets heavy at large-scale.  ...  Just selected raw data is extracted and loaded into a provenance database to be further queried.  ... 
doi:10.1002/cpe.3616 fatcat:6xtx2o277ra5ppr2bcf7gdkhna


Vítor Silva, Daniel de Oliveira, Patrick Valduriez, Marta Mattoso
2018 Proceedings of the VLDB Endowment  
We will also encourage attendees to use DfAnalyzer for their own applications. PVLDB Reference Format: Vítor Silva, Daniel de Oliveira, Patrick Valduriez, Marta Mattoso.  ...  DfAnalyzer provides lightweight dataflow monitoring components to be invoked by high performance applications.  ...  Data Loading and Dataflow Analysis As provenance and raw data have been extracted/indexed, PDE loads such data into DfDB database, which is managed by MonetDB.  ... 
doi:10.14778/3229863.3236265 fatcat:6llryj4m4ra6rhs73a6yqkzxuu

MB++: An Integrated Architecture for Pervasive Computing and High-Performance Computing

David J. Lillethun, David Hilley, Seth Horrigan, Umakishore Ramachandran
2007 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2007)  
Further, we show that our implementation can exploit opportunities for parallelism in dataflow graphs, as well as efficiently sharing common subgraphs between dataflow graphs.  ...  The transformation engine executes dataflow graphs of transformations on high-performance computing resources.  ...  Dataflow Graph Parallelization This experiment demonstrates the benefit of parallelizing dataflow graphs.  ... 
doi:10.1109/rtcsa.2007.47 dblp:conf/rtcsa/LillethunHHR07 fatcat:oyjfedl4xneftj7gd7aziw3cnq

Shared Arrangements: practical inter-query sharing for streaming dataflows [article]

Frank McSherry and Andrea Lattuada and Malte Schwarzkopf and Timothy Roscoe
2020 arXiv   pre-print
Current systems for data-parallel, incremental processing and view maintenance over high-rate streams isolate the execution of independent queries.  ...  This paper introduces shared arrangements: indexed views of maintained state that allow concurrent queries to reuse the same in-memory state without compromising data-parallel performance and scaling.  ...  Timely Dataflow is a model for data-parallel dataflow execution, introduced by Naiad [28] .  ... 
arXiv:1812.02639v3 fatcat:7jvlhrceofahpikttojhcxsepq

Beyond Dataflow

Borut Robi�, Jurij �ilc, Theo Ungerer
2000 Journal of Computing and Information Technology  
Also some other techniques for combining control-flow and dataflow emerged, such as coarse-grain dataflow, dataflow with complex machine operations, RISC dataflow, and micro dataflow.  ...  This paper presents some recent advanced dataflow architectures.  ...  Stollman Dataflow Machine The Stollman dataflow machine (Glück-Hiltrop et al., 1989) is a coarse-grain dataflow architecture directed towards database applications.  ... 
doi:10.2498/cit.2000.02.01 fatcat:3bonvcsg6jbnzj3uouzwkc5tcm

Fluχ: a quality-driven dataflow model for data intensive computing

Sérgio Esteves, João Nuno Silva, Luís Veiga
2013 Journal of Internet Services and Applications  
of the dataflow, that would minimize the number of executions (processing steps), reducing overhead and augmenting performance, while maintaining the dataflow processing results within certain coverage  ...  Also, this notion can be specially beneficial in cloud computing, where a dataflow computing service (SaaS) may provide certain QoD levels for different budgets.  ...  This allows for automatic parallelization. MapReduce is used in large clusters to analyze in parallel huge data sets in domains such as web log and graph analysis.  ... 
doi:10.1186/1869-0238-4-12 fatcat:jz3mjxua4zau7mgvlhdxczaomq

Optimizing ETL Dataflow Using Shared Caching and Parallelization Methods [article]

Xiufeng Liu
2014 arXiv   pre-print
In order to minimize the time and the resources required by ETL dataflows, this paper presents a framework to optimize dataflows using shared cache and parallelization techniques.  ...  Extract-Transform-Load (ETL) handles large amount of data and manages workload through dataflows.  ...  We first evaluate the speedup when the pipeline parallelization is applied to T1. We execute the dataflow when the fact table is loaded with 2, 4, and 8 GB data sets, respectively.  ... 
arXiv:1409.1639v1 fatcat:jof6rd5ukze4jmjpvgyvx3ek6q

Design and analysis of data management in scalable parallel scripting

Zhao Zhang, Daniel S. Katz, Justin M. Wozniak, Allan Espinosa, Ian Foster
2012 2012 International Conference for High Performance Computing, Networking, Storage and Analysis  
We seek to enable efficient large-scale parallel execution of applications in which a shared filesystem abstraction is used to couple many tasks.  ...  We co-design the data management system with the data-aware scheduler to enable dataflow pattern identification and automatic optimization.  ...  David Mathog (Caltech) for his support with parallel BLAST, and the ALCF support team. Work by Katz was supported by the National Science Foundation while working at the Foundation.  ... 
doi:10.1109/sc.2012.44 dblp:conf/sc/ZhangKWEF12 fatcat:mdukzucq7jf33ebsjggjetd7qi

Towards Multiverse Databases

Alana Marzoev, Lara Timbó Araújo, Malte Schwarzkopf, Samyukta Yagati, Eddie Kohler, Robert Morris, M. Frans Kaashoek, Sam Madden
2019 Proceedings of the Workshop on Hot Topics in Operating Systems - HotOS '19  
Our early prototype supports thousands of parallel universes on a single server.  ...  We propose an efficient design based on a joint dataflow across "universes" that combines global, shared computation and cached state with individual, per-user processing and state.  ...  Specifically, scalable, parallel streaming dataflow computing systems now support partially-stateful and dynamically-changing dataflows [11] .  ... 
doi:10.1145/3317550.3321425 dblp:conf/hotos/MarzoevASYKMKM19 fatcat:juer3mguybaklbhudkuou4o4aq


2007 GCA 2007  
The dataflow engine dispatches the tasks onto candidate distributed computing resources in the system, and manages failures and load balancing problems in a transparent manner.  ...  The dataflow programming model provides users with a transparent interface for application programming and execution management in a parallel and distributed computing environment.  ...  River [13] provides a dataflow programming environment for scientific database like applications on clusters through a visual interface.  ... 
doi:10.1142/9789812708823_0003 fatcat:4cn5kscwunddfk32qncz6n4gsy

Algebraic dataflows for big data analysis

Jonas Dias, Eduardo Ogasawara, Daniel de Oliveira, Fabio Porto, Patrick Valduriez, Marta Mattoso
2013 2013 IEEE International Conference on Big Data  
We illustrate how a big data processing dataflow can be modeled using the algebra.  ...  In this paper, we propose an approach for big data analysis based on algebraic workflows, which yields optimization and parallel execution of activities and supports user steering using provenance queries  ...  It extends concepts from dataflow languages and parallel databases to model workflows that can be executed in parallel.  ... 
doi:10.1109/bigdata.2013.6691567 dblp:conf/bigdataconf/DiasOOPVM13 fatcat:n5xnz3jx6bgfhniujhc5n27jia

Streaming vs. Functions: A Cost Perspective on Cloud Event Processing [article]

Tobias Pfandzelter and Sören Henning and Trever Schirmer and Wilhelm Hasselbring and David Bermbach
2022 arXiv   pre-print
We implement stateless and stateful workflows from the Theodolite benchmarking suite using cloud FaaS and DSP.  ...  Despite their architectural differences, both can be used to model and implement loosely-coupled job graphs. In this paper, we consider the selection of FaaS and DSP from a cost perspective.  ...  This results in 10 windows per emulated sensor that are maintained in parallel.  ... 
arXiv:2204.11509v1 fatcat:dafvucxyhvbfzfxoomm3jk2wpe

Stream-Dataflow Acceleration

Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani, Karthikeyan Sankaralingam
2017 Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA '17  
The dataflow component of this architecture enables high concurrency, and the stream component enables communication and coordination at very-low power and area overhead.  ...  ACKNOWLEDGMENTS We would first like to thank the anonymous reviewers for their detailed questions and suggestions which helped us to clarify the presentation.  ...  It uses a stream-based abstraction for accessing database columns, and a dataflow abstraction for performing computations.  ... 
doi:10.1145/3079856.3080255 dblp:conf/isca/NowatzkiGAS17 fatcat:xm36xv6cbfevveabvmpafgjtli

Parallel database systems

David J. DeWitt, Jim Gray
1990 SIGMOD record  
Parallel database machine architectures based on exotic hardware have evolved to a parallel database systems running atop a parallel dataflow software architecture based on conventional shared-nothing  ...  These new designs provide speedup and scaleup when processing relational database queries. This paper reviews the techniques used by such systems, and surveys current commercial and research systems.  ...  Scans, aggregates, updates, and deletes are parallelized. In addition several utilities use parallelism (e.g. load, reorg, ...).  ... 
doi:10.1145/122058.122071 fatcat:4mjtunvs2bav5c27z2jjdeu4zy
« Previous Showing results 1 — 15 out of 3,434 results