A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Scalable Data Integration and Analysis Architecture for Sensor Data of Pediatric Asthma
2017
2017 IEEE 33rd International Conference on Data Engineering (ICDE)
A main contribution of this work is extending the Spark framework with a mediation layer, based on logical schema mappings and query rewriting, to facilitate data analysis over a consistent harmonized ...
Our architecture is based on the Apache Kafka, Spark and Hadoop frameworks and PostgreSQL DBMS. ...
The Apache Spark SQL engine [1] handles the distribution of the queries and pushes raw sensor data and harmonized data to the Storage layer. The Storage layer manages permanent data storage. ...
doi:10.1109/icde.2017.198
pmid:29731601
pmcid:PMC5935488
dblp:conf/icde/StripelisACEH17
fatcat:3i5xtfrrlrdfrbzz6ndpdoct7i
Collaborative Cloud Computing Framework for Health Data with Open Source Technologies
[article]
2020
arXiv
pre-print
We propose a novel architecture for software-hardware-data ecosystem using open source technologies such as Apache Hadoop, Kubernetes and JupyterHub in a distributed environment. ...
kernel, iii) scalable, and iv) compliant with the HIPAA privacy law. ...
UlTraMan is an integrated platform of extended Apache Spark in both data storage and computing aspects. The extension has been made by integrating Spark with Chronicle Map and enhanced MapReduce. ...
arXiv:2007.10498v1
fatcat:gucnqk6gorfsni6qsm5cx3bjja
NFDI Data Integration
2022
Zenodo
In addition, this project develops a recommendation for an architecture and provides the next steps for a reference implementation. ...
Our envisaged work plan follows an agile approach with an iterative development of results and their validation against an increasing number of use stories obtained from various NFDI consortia. ...
data integration workflows would e. g. be Apache Spark). ...
doi:10.5281/zenodo.6518771
fatcat:762le3dojvd2ji7f3x6owbuyca
NFDI Data Integration
2022
Zenodo
In addition, this project develops a recommendation for an architecture and provides the next steps for a reference implementation. ...
Our envisaged work plan follows an agile approach with an iterative development of results and their validation against an increasing number of use stories obtained from various NFDI consortia. ...
data integration workflows would e. g. be Apache Spark). ...
doi:10.5281/zenodo.6519590
fatcat:3r3voqdfevbonjai7xywffpweq
Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
[article]
2018
arXiv
pre-print
Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter ...
Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache ...
DE-AC05-00OR22725 with the U.S. Department of Energy. ...
arXiv:1802.10233v1
fatcat:c66gmtyyz5gkbalnyqbhcbnl7u
Apache Calcite
2018
Proceedings of the 2018 International Conference on Management of Data - SIGMOD '18
Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter ...
Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache ...
DE-AC05-00OR22725 with the U.S. Department of Energy. ...
doi:10.1145/3183713.3190662
dblp:conf/sigmod/BegoliCHML18
fatcat:rbov7kz6svf7ngv3otxarfnuwy
Benchmarking Distributed Stream Processing Engines
[article]
2018
arXiv
pre-print
In this paper, we propose a framework to evaluate the performance of three SDPSs, namely Apache Storm, Apache Spark, and Apache Flink. ...
First, we give a definition of latency and throughput for stateful operators. ...
StreamBench [2l] , proposes a method to measure the throughput and latency of a SDPS with the use of a mediator system between the data source and the SUT. ...
arXiv:1802.08496v1
fatcat:zfdxcy3eajcv7eqafymethtvmq
IoT and Big Data Infrastructure for Smart Demand Response Services
[article]
2022
Zenodo
As a proof of concept, data from ENERTALK open dataset were analyzed using Apache Spark. ...
With this in mind, the goal is to design an effective infrastructure that enables streaming of energy consumption data from smart plugs and real-time analytics of that data. ...
Streaming SQL GraphX MLlib R Spark core libraries Resource manager Data storage Physical layer FIGURE 8. Architecture of Apache Spark (Apache, n.d.) ...
doi:10.5281/zenodo.5901466
fatcat:mdai5vvuhnajndrwwpqvpjs2ly
Big Data Analytics = Machine Learning + Cloud Computing
[chapter]
2016
Big Data
Spark consists of seven major elements: Spark core of data engine, Spark cluster manager (includes Hadoop, Apache Mesos and built-in Standalone cluster manger), Spark SQL, Spark streaming, Spark Machine ...
It became an Apache project in 2007. Since then, it has absorbed many tools in Apache Lucene's library to enhance and extend its full text search capability. ...
Cafarella created it in early 2006, their original idea was to build Apache Nutch (or a web crawler engine) on a cheaper infrastructure. ...
doi:10.1016/b978-0-12-805394-2.00001-5
fatcat:2a2avnxwivbztmp7iksxqgkv2a
Experiences in the Development of a Data Management System for Genomics
[chapter]
2018
Communications in Computer and Information Science
Coherently, GMQL is a high-level, declarative language inspired by big data management, and its execution engines include classic cloud-based systems, from Pig to Flink to SciDB to Spark. ...
and personalized medicine will increasingly rely on data extraction and analysis methods for inferring new knowledge from existing heterogeneous repositories of processed datasets, typically augmented with ...
From bottom to top, it includes the repository layer, the engine layer and the GMQL layer, which in turn consists of an orchestrator and a compiler, and is accessible through a web service API. ...
doi:10.1007/978-3-319-94809-6_10
fatcat:zhkditdco5b2nh3fvgmjygn5yu
Big Data Analytics = Machine Learning + Cloud Computing
[article]
2016
arXiv
pre-print
We augment 3Vs with additional attributes of Big Data to make it more comprehensive and relevant. ...
This chapter is devoted to help decision makers by defining BDA as a solution and opportunity to address their business needs. ...
Spark consists of seven major elements: Spark core of data engine, Spark cluster manager (includes Hadoop, Apache Mesos and built-in Standalone cluster manger), Spark SQL, Spark streaming, Spark Machine ...
arXiv:1601.03115v1
fatcat:ogzvtaigsngelj7hhlkqzheraa
Big Data and Personalisation for Non-Intrusive Smart Home Automation
2021
Big Data and Cognitive Computing
We employ open-source frameworks such as Apache Spark, Apache NiFi and FB-Prophet along with popular vendor tech-stacks such as Azure and DataBricks. ...
We demonstrate the implementation of our proposed novel technology instantiation approach for achieving non-intrusive IoT based big data analytics with a use case of a smart home environment. ...
Acknowledgments: This work contains processed information from static and dynamic datasets from a smart home setting with smart hub running on a docker container with Raspberry Pi, OpenHab and Apache NiFi ...
doi:10.3390/bdcc5010006
fatcat:6dv4b3dvbrdkjei2vvccr5pstq
Benchmarking Distributed Stream Data Processing Systems
2018
2018 IEEE 34th International Conference on Data Engineering (ICDE)
We use our suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink. ...
First, we give a definition of latency and throughput for stateful operators. ...
StreamBench [19] , proposes a method to measure the throughput and latency of a SDPS with the use of a mediator system between the data source and the SUT. ...
doi:10.1109/icde.2018.00169
dblp:conf/icde/KarimovRKSHM18
fatcat:yfvlfvsgvzaj7opgqin6cudxzu
Comparative Study of Record Linkage Approaches for Big Data
2021
Walailak Journal of Science and Technology
Spark and Apache Flink. ...
Although the comparative study includes many recent studies supporting Apache Spark, adopting Apache Spark to solve the problem of record linkage is not yet well explored in literature, as more researches ...
[36] presented SparkER-MOMISDF (Spark Entity Resolution Mediator envirOnment for Multiple Information Sources Data Fusion) where SparkER presented in [32] will be extended with post-processing methods ...
doi:10.48048/wjst.2021.7221
fatcat:etnf63jobzhkpgad5z762pe4qm
Query processing in multistore systems: an overview
2016
International Journal of Cloud Computing
Compared to traditional DBMSs, cloud data management uses a different software stack with the following layers: distributed storage, database management and distributed processing. ...
Parallel DBMSs use either a shared-nothing or shared-disk architecture. With ...
Spark SQL Spark SQL [AXL + 15] is a recent module in Apache Spark that integrates relational data processing with Spark's functional programming API. ...
doi:10.1504/ijcc.2016.080903
fatcat:etteedwysbcyfidbaeh7x2lrne
« Previous
Showing results 1 — 15 out of 592 results