Filters








91 Hits in 5.6 sec

KerA: Scalable Data Ingestion for Stream Processing

Ovidiu-Cristian Marcu, Alexandru Costan, Gabriel Antoniu, Maria Perez-Hernandez, Bogdan Nicolae, Radu Tudoran, Stefano Bortoli
2018 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)  
This paper introduces KerA, a novel ingestion system for scalable stream processing that addresses the aforementioned limitations of the state of art.  ...  DESIGN PRINCIPLES FOR STREAM INGESTION In order to address the issues detailed in the previous section, we introduce a set of design principles for efficient stream ingestion and scalable processing. a  ... 
doi:10.1109/icdcs.2018.00152 dblp:conf/icdcs/MarcuCAPNTB18 fatcat:pphuux34bzar5jxeapv2nwlug4

A Scalable and Robust Framework for Data Stream Ingestion [article]

Haruna Isah, Farhana Zulkernine
2018 arXiv   pre-print
This paper investigates the fundamental requirements and the state of the art of existing data stream ingestion systems, propose a scalable and fault-tolerant data stream ingestion and integration framework  ...  The study also identifies best practices and gaps for future research in developing large-scale data stream processing infrastructure.  ...  ACKNOWLEDGMENT Special thanks to Southern Ontario Smart Computing for Innovation Platform (SOSCIP) and IBM Canada for supporting this research project.  ... 
arXiv:1812.04197v1 fatcat:freh5fgeu5ezhbfi5lutmc6smy

Towards a unified storage and ingestion architecture for stream processing

Ovidiu-Cristian Marcu, Alexandru Costan, Gabriel Antoniu, Maria S. Perez-Hernandez, Radu Tudoran, Stefano Bortoli, Bogdan Nicolae
2017 2017 IEEE International Conference on Big Data (Big Data)  
In this position paper, we argue for a unified ingestion and storage architecture for streaming data that addresses the aforementioned challenge.  ...  Current streaming-oriented runtimes and middlewares are not flexible enough to deal with this trend, as they address ingestion (collection and pre-processing of data streams) and persistent storage (archival  ...  Storage systems and processing engines should understand DIPS interfaces (i.e., for data ingestion, processing, and storage for streams of records): they offer APIs to read, write, process and store streams  ... 
doi:10.1109/bigdata.2017.8258196 dblp:conf/bigdataconf/MarcuCAPTBN17 fatcat:w5a2fruranfflbwchhfuerqrpm

In-database distributed machine learning

Sandeep Singh Sandha, Wellington Cabrera, Mohammed Al-Kateb, Sanjay Nair, Mani Srivastava
2019 Proceedings of the VLDB Endowment  
on data scientists and becomes a performance bottleneck for model training.  ...  The popular approach -training machine learning models in frameworks like Tensorflow, Pytorch and Keras -requires movement of data from database engines to analytical engines, which adds an excessive overhead  ...  To check scalability, three Teradata clusters of different size would be available for computation: small, medium and large.  ... 
doi:10.14778/3352063.3352083 fatcat:uz7cagmlpzcg5mbmxt75xurguu

Detecting Irregular Patterns in IoT Streaming Data for Fall Detection [article]

Sazia Mahfuz, Haruna Isah, Farhana Zulkernine, Peter Nicholls
2018 arXiv   pre-print
The initial model was developed using IBM Watson studio and then later transferred and deployed on IBM Cloud with the streaming analytics service supported by IBM Streams for monitoring real-time IoT data  ...  Detecting patterns in real time streaming data has been an interesting and challenging data analytics problem.  ...  Typical examples of these cutting-edge tools include Kafka Connect, a framework included in Apache Kafka [22] for data stream ingestion, IBM Streams for data stream processing, Cassandra [23] for efficient  ... 
arXiv:1811.06672v1 fatcat:jilcnzf6qvg5vo4l3ck6quwf6q

Fog Computing for Smart Cities' Big Data Management and Analytics: A Review

Elarbi Badidi, Zineb Mahrez, Essaid Sabir
2020 Future Internet  
The new and emerging paradigms of edge and fog computing promise to address big data storage and analysis in The field of smart cities.  ...  Furthermore, many smart city applications are time-sensitive and need to quickly analyze data to react promptly to The various events occurring in a city.  ...  These components include tools for data ingestion, data stream processing and analytics, and data visualization.  ... 
doi:10.3390/fi12110190 fatcat:i3ziazyorve2rc2zvhxzskirhm

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Giang Nguyen, Stefan Dlugolinsky, Martin Bobák, Viet Tran, Álvaro López García, Ignacio Heredia, Peter Malík, Ladislav Hluchý
2019 Artificial Intelligence Review  
It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.  ...  Giang Nguyen, Viet Tran, Stefan Dlugolinsky, Martin Bobák, and Ladislav Hluchý are also supported by the Project VEGA 2/0167/16 "Methods and algorithms for the semantic processing of Big Data in distributed  ...  The authors would like to thanks to all colleagues, especially for Ján Astaloš for knowledge sharing and teamwork.  ... 
doi:10.1007/s10462-018-09679-z fatcat:ueffoypwlva4ndo35g5gzfrpcy

Big Data Processing and Analytics Platform Architecture for Process Industry Factories

Martin Sarnovsky, Peter Bednar, Miroslav Smatana
2018 Big Data and Cognitive Computing  
This paper describes the architecture of a cross-sectorial Big Data platform for the process industry domain.  ...  The main objective was to design a scalable analytical platform that will support the collection, storage and processing of data from multiple industry domains.  ...  Apache Flink is an open source framework for distributed stream processing. Flink uses the concepts of streams for all applications. In Flink's terms, a batch is a finite set of streamed data.  ... 
doi:10.3390/bdcc2010003 fatcat:xiporsbfwjfbbnlrxyxd5l7asu

Machine Learning Based Approach to Anomaly and Cyberattack Detection in Streamed Network Traffic Data

Mikolaj Komisarek, Marek Pawlicki, Rafal Kozik, Michal Choras
2021 Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications  
In this paper, the performance of a solution providing stream processing is evaluated, and its accuracy in the classification of suspicious flows in simulated network traffic is investigated.  ...  The tool allows easy definition of streams and implementation of any machine learning algorithm.  ...  This work has been also supported by the SIMARGL Project -Secure Intelligent Methods for Advanced RecoGnition of malware and stegomalware, with the support of the European Commission and the Horizon 2020  ... 
doi:10.22667/jowua.2021.03.31.003 dblp:journals/jowua/KomisarekPKC21 fatcat:u67e3qikzrau3aky6e7tdr5u5y

D4.1 REUSABLE MODEL & ANALYTICAL TOOLS: DESIGN AND OPEN SPECIFICATION 1

Ofer Biran, Oshrit Feder, Sandra Ebro, Alejandro Ramiro, María Ángeles Sanguino, Jorge Montero, Argyro Mavrogiorgou, Thanos Kiourtis, George Manias, Nikitas Sgouros, Kostas Nasias
2020 Zenodo  
Specification and design of the built-in analytics tools for Situational Knowledge, Opinion Mining & Sentiment Analysis, Social Dynamics & Behavioral Data analysis.  ...  Internal architecture of the Integrated Acquisition and Analytics Layer, responsible for the integration of analytical tools in extensible manner, registration of new data sources and applying the required  ...  data is available, for Ingest-now type the data will be ingested upon registration, processed by the registered Analytic-ingest Function(s), and for External type the data will be read (and possibly processed  ... 
doi:10.5281/zenodo.4081335 fatcat:ei5tghz6drgdnc4akwwrbl7ole

Kafka-ML: Connecting the data stream with ML/AI frameworks

Cristian Martín, Peter Langendoerfer, Pouya Soltani Zarrin, Manuel Díaz, Bartolomé Rubio
2021 Future generations computer systems  
Finally, a novel approach has been introduced to manage and reuse data streams, which may eliminate the need for data storage or file systems.  ...  With the digital revolution and current paradigms like the Internet of Things, this information is turning from static data to continuous data streams.  ...  We are grateful for the work of all the reviewers who have greatly contributed to improving the quality of this article.  ... 
doi:10.1016/j.future.2021.07.037 fatcat:gfwq5qo4frabhjhqen3ayugoni

A Data Ecosystem to Support Machine Learning in Materials Science [article]

Ben Blaiszik, Logan Ward, Marcus Schwarting, Jonathon Gaff, Ryan Chard, Daniel Pike, Kyle Chard, Ian Foster
2019 arXiv   pre-print
Here, we present two projects, the Materials Data Facility (MDF) and the Data and Learning Hub for Science (DLHub), that address these needs.  ...  of new data across the ecosystem, and the connecting of data with materials-specific machine learning models.  ...  and reuse; software tools to simplify data discovery, aggregation and use; and a library of curated machine learning models and processing logic that can easily be applied to new data streams.  ... 
arXiv:1904.10423v1 fatcat:gibssrxzxbaqxdjsfa4ld4gd2a

D2.4 STATE OF THE ART & REQUIREMENTS ANALYSIS M12

Boyan Kolev, Jose María Zaragoza, Patricio Martinez, Luis Miguel Garcia, Ofer Biran, Konstantinos Moutselos, María Ángeles Sanguino, Jorge Montero, Ana Luiza Pontual, Miquel Milà, Ricard Munné, Giuseppe La Rocca (+17 others)
2021 Zenodo  
It provides SQL interface for stream processing above Apache Kafka 29 , and even the open source version is designed for mission-critical and scalable deployments.  ...  ingestion in high rates from a stream Level of detail Software Type DATA (data) Description It has been identified the need to ingest data coming from a data stream to the data store, preserving data  ...  data or analysed data) and the phase of the policy lifecycle (e.g. modelling or experimentation process).  ... 
doi:10.5281/zenodo.4560336 fatcat:pk2tqbdsvrg7hjmwwwz37c2vou

Process Monitoring Platform based on Industry 4.0 tools: a waste-to-energy plant case study

James Clovis Kabugo, Sirkka-Liisa Jamsa-Jounela, Robert Schiemann, Christian Binder
2019 2019 4th Conference on Control and Fault Tolerant Systems (SysTol)  
This work presents a process data analytics platform built around the concept of industry 4.0.  ...  in a WTE process.  ...  Through the use of IoT cloud gateways, streaming data or historical data is ingested into the industrial IoT cloud platform. Industrial IoT vendors normally provide IoT cloud getaway connections.  ... 
doi:10.1109/systol.2019.8864766 dblp:conf/systol/KabugoJSB19 fatcat:ddhvawl5mzfmxmnuj7ljdvum4q

D2.1 STATE OF THE ART & REQUIREMENTS ANALYSIS

Sandra Ebro, Boyan Kolev, Jose María Zaragoza, Patricio Martinez, Luis Miguel Garcia, Konstantinos Moutselos, Ofer Biran, María Ángeles Sanguino, Jorge Montero, Ana Luiza Pontual, Miquel Milà, Ricard Munné (+14 others)
2020 Zenodo  
Moreover, in order for the platform to keep track with the latest technological advances, a state-of-the-art analysis has been performed regarding the major technologies that are envisioned to be exploited  ...  Apache Spark Streaming) with capability to integrate analytic functions to process the ingested data.  ...  It provides SQL interface for stream processing above Apache Kafka 25 , and even the open source version is designed for mission-critical and scalable deployments.  ... 
doi:10.5281/zenodo.3991661 fatcat:5yzxb3wagnfhrpinlhepazl5py
« Previous Showing results 1 — 15 out of 91 results