Filters








592 Hits in 5.6 sec

A Scalable Data Integration and Analysis Architecture for Sensor Data of Pediatric Asthma

Dimitris Stripelis, Jose Luis Ambite, Yao-Yi Chiang, Sandrah P. Eckel, Rima Habre
2017 2017 IEEE 33rd International Conference on Data Engineering (ICDE)  
A main contribution of this work is extending the Spark framework with a mediation layer, based on logical schema mappings and query rewriting, to facilitate data analysis over a consistent harmonized  ...  Our architecture is based on the Apache Kafka, Spark and Hadoop frameworks and PostgreSQL DBMS.  ...  The Apache Spark SQL engine [1] handles the distribution of the queries and pushes raw sensor data and harmonized data to the Storage layer. The Storage layer manages permanent data storage.  ... 
doi:10.1109/icde.2017.198 pmid:29731601 pmcid:PMC5935488 dblp:conf/icde/StripelisACEH17 fatcat:3i5xtfrrlrdfrbzz6ndpdoct7i

Collaborative Cloud Computing Framework for Health Data with Open Source Technologies [article]

Fatemeh Rouzbeh, Ananth Grama, Paul Griffin, Mohammad Adibuzzaman
2020 arXiv   pre-print
We propose a novel architecture for software-hardware-data ecosystem using open source technologies such as Apache Hadoop, Kubernetes and JupyterHub in a distributed environment.  ...  kernel, iii) scalable, and iv) compliant with the HIPAA privacy law.  ...  UlTraMan is an integrated platform of extended Apache Spark in both data storage and computing aspects. The extension has been made by integrating Spark with Chronicle Map and enhanced MapReduce.  ... 
arXiv:2007.10498v1 fatcat:gucnqk6gorfsni6qsm5cx3bjja

NFDI Data Integration

Bernhard Seeger, Andreas Henrich, Thorsten Papenbrock, Dirk Riehle
2022 Zenodo  
In addition, this project develops a recommendation for an architecture and provides the next steps for a reference implementation.  ...  Our envisaged work plan follows an agile approach with an iterative development of results and their validation against an increasing number of use stories obtained from various NFDI consortia.  ...  data integration workflows would e. g. be Apache Spark).  ... 
doi:10.5281/zenodo.6518771 fatcat:762le3dojvd2ji7f3x6owbuyca

NFDI Data Integration

Bernhard Seeger, Andreas Henrich, Thorsten Papenbrock, Dirk Riehle
2022 Zenodo  
In addition, this project develops a recommendation for an architecture and provides the next steps for a reference implementation.  ...  Our envisaged work plan follows an agile approach with an iterative development of results and their validation against an increasing number of use stories obtained from various NFDI consortia.  ...  data integration workflows would e. g. be Apache Spark).  ... 
doi:10.5281/zenodo.6519590 fatcat:3r3voqdfevbonjai7xywffpweq

Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources [article]

Edmon Begoli, Jesús Camacho Rodríguez, Julian Hyde, Michael J. Mior, Daniel Lemire
2018 arXiv   pre-print
Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter  ...  Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache  ...  DE-AC05-00OR22725 with the U.S. Department of Energy.  ... 
arXiv:1802.10233v1 fatcat:c66gmtyyz5gkbalnyqbhcbnl7u

Apache Calcite

Edmon Begoli, Jesús Camacho-Rodríguez, Julian Hyde, Michael J. Mior, Daniel Lemire
2018 Proceedings of the 2018 International Conference on Management of Data - SIGMOD '18  
Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter  ...  Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache  ...  DE-AC05-00OR22725 with the U.S. Department of Energy.  ... 
doi:10.1145/3183713.3190662 dblp:conf/sigmod/BegoliCHML18 fatcat:rbov7kz6svf7ngv3otxarfnuwy

Benchmarking Distributed Stream Processing Engines [article]

Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, Volker Markl
2018 arXiv   pre-print
In this paper, we propose a framework to evaluate the performance of three SDPSs, namely Apache Storm, Apache Spark, and Apache Flink.  ...  First, we give a definition of latency and throughput for stateful operators.  ...  StreamBench [2l] , proposes a method to measure the throughput and latency of a SDPS with the use of a mediator system between the data source and the SUT.  ... 
arXiv:1802.08496v1 fatcat:zfdxcy3eajcv7eqafymethtvmq

IoT and Big Data Infrastructure for Smart Demand Response Services [article]

Radenkovic Milos, Popovic Snezana, Despotovic-Zrakic Marijana, Naumovic Tamara, Lazarevic Sasa
2022 Zenodo  
As a proof of concept, data from ENERTALK open dataset were analyzed using Apache Spark.  ...  With this in mind, the goal is to design an effective infrastructure that enables streaming of energy consumption data from smart plugs and real-time analytics of that data.  ...  Streaming SQL GraphX MLlib R Spark core libraries Resource manager Data storage Physical layer FIGURE 8. Architecture of Apache Spark (Apache, n.d.)  ... 
doi:10.5281/zenodo.5901466 fatcat:mdai5vvuhnajndrwwpqvpjs2ly

Big Data Analytics = Machine Learning + Cloud Computing [chapter]

C. Wu, R. Buyya, K. Ramamohanarao
2016 Big Data  
Spark consists of seven major elements: Spark core of data engine, Spark cluster manager (includes Hadoop, Apache Mesos and built-in Standalone cluster manger), Spark SQL, Spark streaming, Spark Machine  ...  It became an Apache project in 2007. Since then, it has absorbed many tools in Apache Lucene's library to enhance and extend its full text search capability.  ...  Cafarella created it in early 2006, their original idea was to build Apache Nutch (or a web crawler engine) on a cheaper infrastructure.  ... 
doi:10.1016/b978-0-12-805394-2.00001-5 fatcat:2a2avnxwivbztmp7iksxqgkv2a

Experiences in the Development of a Data Management System for Genomics [chapter]

Stefano Ceri, Arif Canakoglu, Abdulrahman Kaitoua, Marco Masseroli, Pietro Pinoli
2018 Communications in Computer and Information Science  
Coherently, GMQL is a high-level, declarative language inspired by big data management, and its execution engines include classic cloud-based systems, from Pig to Flink to SciDB to Spark.  ...  and personalized medicine will increasingly rely on data extraction and analysis methods for inferring new knowledge from existing heterogeneous repositories of processed datasets, typically augmented with  ...  From bottom to top, it includes the repository layer, the engine layer and the GMQL layer, which in turn consists of an orchestrator and a compiler, and is accessible through a web service API.  ... 
doi:10.1007/978-3-319-94809-6_10 fatcat:zhkditdco5b2nh3fvgmjygn5yu

Big Data Analytics = Machine Learning + Cloud Computing [article]

Caesar Wu, Rajkumar Buyya, Kotagiri Ramamohanarao
2016 arXiv   pre-print
We augment 3Vs with additional attributes of Big Data to make it more comprehensive and relevant.  ...  This chapter is devoted to help decision makers by defining BDA as a solution and opportunity to address their business needs.  ...  Spark consists of seven major elements: Spark core of data engine, Spark cluster manager (includes Hadoop, Apache Mesos and built-in Standalone cluster manger), Spark SQL, Spark streaming, Spark Machine  ... 
arXiv:1601.03115v1 fatcat:ogzvtaigsngelj7hhlkqzheraa

Big Data and Personalisation for Non-Intrusive Smart Home Automation

Suriya Priya R. Asaithambi, Sitalakshmi Venkatraman, Ramanathan Venkatraman
2021 Big Data and Cognitive Computing  
We employ open-source frameworks such as Apache Spark, Apache NiFi and FB-Prophet along with popular vendor tech-stacks such as Azure and DataBricks.  ...  We demonstrate the implementation of our proposed novel technology instantiation approach for achieving non-intrusive IoT based big data analytics with a use case of a smart home environment.  ...  Acknowledgments: This work contains processed information from static and dynamic datasets from a smart home setting with smart hub running on a docker container with Raspberry Pi, OpenHab and Apache NiFi  ... 
doi:10.3390/bdcc5010006 fatcat:6dv4b3dvbrdkjei2vvccr5pstq

Benchmarking Distributed Stream Data Processing Systems

Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, Volker Markl
2018 2018 IEEE 34th International Conference on Data Engineering (ICDE)  
We use our suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink.  ...  First, we give a definition of latency and throughput for stateful operators.  ...  StreamBench [19] , proposes a method to measure the throughput and latency of a SDPS with the use of a mediator system between the data source and the SUT.  ... 
doi:10.1109/icde.2018.00169 dblp:conf/icde/KarimovRKSHM18 fatcat:yfvlfvsgvzaj7opgqin6cudxzu

Comparative Study of Record Linkage Approaches for Big Data

Randa MOHAMED, Ali EL-BASTAWISSY, Eman NASR, Mervat GHEITH
2021 Walailak Journal of Science and Technology  
Spark and Apache Flink.  ...  Although the comparative study includes many recent studies supporting Apache Spark, adopting Apache Spark to solve the problem of record linkage is not yet well explored in literature, as more researches  ...  [36] presented SparkER-MOMISDF (Spark Entity Resolution Mediator envirOnment for Multiple Information Sources Data Fusion) where SparkER presented in [32] will be extended with post-processing methods  ... 
doi:10.48048/wjst.2021.7221 fatcat:etnf63jobzhkpgad5z762pe4qm

Query processing in multistore systems: an overview

Carlyna Bondiombouy, Patrick Valduriez
2016 International Journal of Cloud Computing  
Compared to traditional DBMSs, cloud data management uses a different software stack with the following layers: distributed storage, database management and distributed processing.  ...  Parallel DBMSs use either a shared-nothing or shared-disk architecture. With  ...  Spark SQL Spark SQL [AXL + 15] is a recent module in Apache Spark that integrates relational data processing with Spark's functional programming API.  ... 
doi:10.1504/ijcc.2016.080903 fatcat:etteedwysbcyfidbaeh7x2lrne
« Previous Showing results 1 — 15 out of 592 results