Filters








120 Hits in 6.4 sec

Kafka interfaces for composable streaming genomics pipelines [article]

Francesco Versaci, Luca Pireddu, Gianluigi Zanetti
2017 bioRxiv   pre-print
Modern sequencing machines produce order of a terabyte of data per day, which need subsequently to go through a complex processing pipeline.  ...  We decompose the first steps of the genomic processing in two distinct and specialized modules (preprocessing and alignment) and we loosely compose them via communication through Kafka streams, in order  ...  on Kafka and its potential uses.  ... 
doi:10.1101/182030 fatcat:bvewunvwkjhcronirwkzjechsy

Kafka interfaces for composable streaming genomics pipelines

Francesco Versaci, Luca Pireddu, Gianluigi Zanetti
2018 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)  
Modern sequencing machines produce order of a terabyte of data per day, which need subsequently to go through a complex processing pipeline.  ...  We decompose the first steps of the genomic processing in two distinct and specialized modules (preprocessing and alignment) and we loosely compose them via communication through Kafka streams, in order  ...  on Kafka and its potential uses.  ... 
doi:10.1109/bhi.2018.8333418 dblp:conf/bhi/VersaciPZ18 fatcat:3tl35kjc2rfx5jap2ex2ynqd2a

Pico: A Domain-Specific Language For Data Analytics Pipelines

Claudia Misale, Marco Aldinucci, Guy Tremblay
2017 Zenodo  
, as we are often used to see in state-of-the-art Big Data analytics frameworks.  ...  Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective.  ...  Acknowledgements Funding This work has been partially supported by the Italian Ministry of Education and Research (MIUR), by the EU-H2020 RIA project "Toreador" (no. 688797), the EU-H2020 RIA project  ... 
doi:10.5281/zenodo.579753 fatcat:aadje57qh5hk3ijmqn4j7vkhpm

A Comparative Study on Streaming Frameworks for Big Data

Wissem Inoubli, Sabeur Aridhi, Haithem Mezni, Mondher Maddouri, Engelbert Mephu Nguifo
2018 Very Large Data Bases Conference  
We also present an experimental evaluation and a comparative study of the most popular streaming platforms.  ...  Yet, many research works focus on streaming in Big Data, a task referring to the processing of massive volumes of structured/unstructured streaming data.  ...  Acknowledgements This research was partially supported by the General Direction of Scientific Research in Tunisia (DGRST).  ... 
dblp:conf/vldb/InoubliAMMN18 fatcat:pjb6jwacardhhenoep5aqs3tse

A Survey of Distributed Data Stream Processing Frameworks

Haruna Isah, Tariq Abughofa, Sazia Mahfuz, Dharmitha Ajerla, Farhana Zulkernine, Shahzad Khan
2019 IEEE Access  
source (Storm, Spark Streaming, Flink, Kafka Streams) and commercial (IBM Streams) distributed data stream processing frameworks.  ...  One of the challenges in developing a streaming analytics infrastructure is the difficulty in selecting the right stream processing framework for the different use cases.  ...  We will describe a DSPS design use case scenario and our choice of DSPE for this use case based on this study. V. DESIGN AND DEVELOPMENT OF A DSPS A.  ... 
doi:10.1109/access.2019.2946884 fatcat:lu6oknfpkraybmtuqxismmlqda

Review of Big Data and Processing Frameworks for Disaster Response Applications

Silvino Pedro Cumbane, Gyozo Gidófalvi
2019 ISPRS International Journal of Geo-Information  
Firstly, potential big data sources are described and characterized. Secondly, the big data processing frameworks are characterized and grouped based on the sources of data they handle.  ...  Deciding which processing framework to use for a specific big data to perform a given task is usually a challenge for researchers from the disaster management field.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/ijgi8090387 fatcat:2fhh4kol2nfatbokwiggnfm5ge

Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit [article]

Ben Blamey, Salman Toor, Martin Dahlö, Håkan Wieslander, Philip J Harrison, Ida-Maria Sintorn, Alan Sabirsh, Carolina Wählby, Ola Spjuth, Andreas Hellander
2020 biorxiv/medrxiv   pre-print
The result is smart data pipelines capable of effective or even optimal use of e.g. storage, compute and network bandwidth, to support experiments involving rapid processing of scientific data characterized  ...  The HASTE Toolkit is a collection of tools to adapt data stream processing to this pipeline model.  ...  Thanks to Polina Georgiev for providing the images used in the evaluation of Case Study 1. Resources from The Swedish National Infrastructure for Computing (SNIC) [34] were used for Case Study 2.  ... 
doi:10.1101/2020.09.13.274779 fatcat:x22wrgsvubby5kpkhy7blruqd4

An experimental survey on big data frameworks

Wissem Inoubli, Sabeur Aridhi, Haithem Mezni, Mondher Maddouri, Engelbert Mephu Nguifo
2018 Future generations computer systems  
We also present an experimental evaluation and a comparative study of the most popular Big Data frameworks.  ...  This survey is concluded with a presentation of best practices related to the use of the studied frameworks in several application domains such as machine learning, graph processing and real-world applications  ...  In this case, recommender systems must be able to process the big stream of data.  ... 
doi:10.1016/j.future.2018.04.032 fatcat:dxl42yu54retblcgttysadacqu

Big-Data framework-based visualization solution for performance analysis of positioning systems in railway environments

Zheng Liu, Iñigo Adin, Saioa Arrizabalaga, Jon Goya, Javier Añorga, Sijia Yang
2018 Zenodo  
A visualization platform based on Big-Data frameworks was highly demanded in this context and several alternatives were analyzed.  ...  This paper describes the approaches that have been tested and implemented, describes the difficulties and advantages of the alternatives and provides detailed steps for adapting the open-source Big-Data  ...  Also, Apache Flink is used to process the stream data, which can be programmed as Kafka producer or Kafka consumer.  ... 
doi:10.5281/zenodo.1483963 fatcat:33xwcr5xmfe37mqsdzkczqcmve

Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit

Ben Blamey, Salman Toor, Martin Dahlö, Håkan Wieslander, Philip J Harrison, Ida-Maria Sintorn, Alan Sabirsh, Carolina Wählby, Ola Spjuth, Andreas Hellander
2021 GigaScience  
We introduce the HASTE Toolkit, a proof-of-concept cloud-native software toolkit based on this pipeline model, to partition and prioritize data streams to optimize use of limited computing resources.  ...  We propose a pipeline model, a design pattern for scientific pipelines, where an incoming stream of scientific data is organized into a tiered or ordered "data hierarchy".  ...  Thanks to Polina Georgiev for providing the images used in the evaluation of Case Study 1. Resources from The Swedish National Infrastructure for Computing (SNIC) [34] were used for Case Study 2.  ... 
doi:10.1093/gigascience/giab018 pmid:33739401 pmcid:PMC7976223 fatcat:2ytlje4u2jfulg7gquwiffn3me

Edge and Cluster Computing as Enabling Infrastructure for Internet of Medical Things

Pierluigi Ritrovato, Fatos Xhafa, Andrea Giordano
2018 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA)  
.; Giordano, A. Edge and cluster computing as enabling infrastructure for Internet 'ús personal d'aquest material.  ...  Ritrovato, P.; Xhafa, F.; Giordano, A. Edge and cluster computing as enabling infrastructure for Internet  ...  The purpose of the Flink cluster is to get data from Kafka cluster and process them: in particular, Flink parses data streams stored in Kafka and detects potential anomalies within them.  ... 
doi:10.1109/aina.2018.00108 dblp:conf/aina/RitrovatoXG18 fatcat:nucsqmuowjcc7n2ijcp3c233im

4.-8. März 2019

Melissa Gehring, Marcela Charfuelan, Volker Markl
2019 Datenbanksysteme für Business, Technologie und Web  
Published studies of performance are used to compare several open-source systems, and two systems are further selected for qualitative comparison and evaluation regarding the development of a time series  ...  Given the vast number of data processing systems available today, in this paper, we aim to identify, select, and evaluate systems to determine the one that is better suited to use in conducting time series  ...  Acknowledgments This work was supported by the German Federal Ministry of Economics and Technology (BMWi) funded SePiA.Pro project, under grant FKZ: 01MD16013.  ... 
doi:10.18420/btw2019-ws-21 dblp:conf/btw/GehringCM19 fatcat:hmcrwv3rjfbpvczh5y44vzx6ue

Industry 4.0 towards Forestry 4.0: Fire Detection Use Case

Radhya Sahal, Saeed H. Alsamhi, John G. Breslin, Muhammad Intizar Ali
2021 Sensors  
Querying windowing is the heart of any stream-processing platform which splits infinite data stream into chunks of finite data to execute a query.  ...  Distributed stream processing platforms have emerged, e.g., Apache Flink, Storm, and Spark, etc.  ...  Data Stream Processing Pipeline This section presents a general overview of the data stream processing pipeline for a query engine used in the Flink streaming platform.  ... 
doi:10.3390/s21030694 pmid:33498450 fatcat:oboxgvvgmveh7oumseypmdt3xi

NAMB: A Quick and Flexible Stream Processing Application Prototype Generator

Alessio Pagliari, Fabrice Huet, Guillaume Urvoy-Keller
2020 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)  
ACKNOWLEDGMENTS Experiments presented in this paper were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities  ...  Data Stream a) Data Characteristics: Data variety is a base characteristic of Big Data.  ...  INTRODUCTION New trends in Big Data require to process high-rate unbounded data flows in almost real-time.  ... 
doi:10.1109/ccgrid49817.2020.00-87 dblp:conf/ccgrid/PagliariHU20 fatcat:ep226jzmzjavdk7v6avx74vh4u

Big Data Analytics Technologies and Platforms: A Brief Review

Ticiana L. Coelho da Silva, Regis Pires Magalhães, Igo Ramalho Brilhante, José A. F. de Macêdo, David Araújo, Paulo A. L. Rego, Aloisio Vieira Lira Neto
2018 Very Large Data Bases Conference  
problems as processing (streaming and batch), storage, data integration, analytics, data governance, and monitoring.  ...  A plethora of Big Data Analytics technologies and platforms have been proposed in the last years. However, in 2017, only 53% of companies are adopting such tools.  ...  Acknowledgments This work has been supported by FUNCAP SPU 8789771/2017 research project and CAPES fellowship.  ... 
dblp:conf/vldb/SilvaMBMARN18 fatcat:ny53uz6ixre7nnib3hga24hque
« Previous Showing results 1 — 15 out of 120 results