Filters








1,832 Hits in 3.2 sec

A Scalable and Robust Framework for Data Stream Ingestion [article]

Haruna Isah, Farhana Zulkernine
2018 arXiv   pre-print
This paper investigates the fundamental requirements and the state of the art of existing data stream ingestion systems, propose a scalable and fault-tolerant data stream ingestion and integration framework  ...  that can serve as a reusable component across many feeds of structured and unstructured input data in a given platform, and demonstrate the utility of the framework in a real-world data stream processing  ...  ACKNOWLEDGMENT Special thanks to Southern Ontario Smart Computing for Innovation Platform (SOSCIP) and IBM Canada for supporting this research project.  ... 
arXiv:1812.04197v1 fatcat:freh5fgeu5ezhbfi5lutmc6smy

BigDataGrapes D3.2 - Data Ingestion & Integration Components

Panagiotis Zervas, Sotiris Konstantinidis, Antonis Koukourikos
2018 Zenodo  
This accompanying document for deliverable D3.2 Data Ingestion & Integration Components describes the mechanisms and tools that will be used in the BigDataGrapes platform to ingest data of different nature  ...  Also, the document describes the tools that will be used for data integration across the different BigDataGrapes platform layers, as well as for long-term storage and preservation of data.  ...  Apache Flume 1 is a tool which has been designed specifically for ingesting stream data. Flume is distributed in nature, and its flexible architecture makes it a robust solution.  ... 
doi:10.5281/zenodo.1482750 fatcat:p3i7hezygvbhhfod55rb34ss2y

Big Data Real Time Ingestion and Machine Learning

Gautam Pal, Gangmin Li, Katie Atkinson
2018 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP)  
Rather waiting for data to be collected as a whole at a long periodic interval, streaming analysis let us identify patterns -and make decisions based on them -as data start arriving.  ...  Data arrives in all shapes and sizes. Many time data are acquired sequentially -as an infinite ever growing stream.  ...  With respect to ingestion, Apache Kafka and Flume are the two popular choices for high velocity Big Data ingestion due to their horizontal scalability and robust failover.  ... 
doi:10.1109/dsmp.2018.8478598 fatcat:e5emr6hxrffvlgxcvxloxtwudm

BigDataGrapes D2.3 - BigDataGrapes Software Stack Design

Panagis Katsivelis
2021 Zenodo  
Deliverable D2.3 is the first submitted iteration of a living document that will describe the components and architecture of the BigDataGrapes software stack, providing design principles and technical  ...  To this end, the specifications document is treated as a living document, with regular submission to the EC of versions that report on significant changes in design and functionality.  ...  -ApacheFlink is an open-source stream processing framework for distributed, high-performing, alwaysavailable, and accurate data streaming applications.  ... 
doi:10.5281/zenodo.4546026 fatcat:a35e4prb5ffqjmsammsac4e52q

BigDataGrapes D2.3 - BigDataGrapes Software Stack Design

Pythagoras Karampiperis, Antonis Koukourikos, Raffaele Perego, Franco Maria Nardini, Nicola Tonellotto, Milena Yankova, Panagiotis Zervas, Mihalis Papakonstadinou
2019 Zenodo  
Deliverable D2.3 is the first submitted iteration of a living document that will describe the components and architecture of the BigDataGrapes software stack, providing design principles and technical  ...  To this end, the specifications document is treated as a living document, with regular submission to the EC of versions that report on significant changes in design and functionality.  ...  -ApacheFlink is an open-source stream processing framework for distributed, high-performing, alwaysavailable, and accurate data streaming applications.  ... 
doi:10.5281/zenodo.3269261 fatcat:7czrc4pqb5hn5kuwh34ytuzmjq

ISTHMUS: Secure, Scalable, Real-time and Robust Machine Learning Platform for Healthcare [article]

Akshay Arora, Arun Nethi, Priyanka Kharat, Vency Verghese, Grant Jenkins, Steve Miff, Vikas Chowdhry, Xiao Wang
2019 arXiv   pre-print
data platform for inferring population as well as patient level insights for Social Determinants of Health (SDoH), and (3) ingesting live-streaming data from various IoT sensors to build models, which  ...  fit for healthcare due to complexities in handling data quality issues, mandates to demonstrate clinical relevance, and a lack of ability to monitor performance in a highly regulated environment with  ...  Finally, we would like to thank everyone on the Isthmus team for making Isthmus a success. Disclaimer Trademark products mentioned in this paper are properties of respective trademark owners.  ... 
arXiv:1909.13343v2 fatcat:iis6r37t55davmvag4g2sxo6pq

BigDataGrapes D2.3 - BigDataGrapes Software Stack Design [article]

Antonis Koukourikos Pythagoras Karampiperis
2018 Zenodo  
Deliverable D2.3 is the first submitted iteration of a living document that will describe the components and architecture of the BigDataGrapes software stack, providing design principles and technical  ...  To this end, the specifications document is treated as a living document, with regular submission to the EC of versions that report on significant changes in design and functionality.  ...  A Java-based ingestion tool, Flume is used when input data streams-in faster than it can be consumed. Typically, Flume is used to ingest streaming data into HDFS.  ... 
doi:10.5281/zenodo.1308602 fatcat:lwb4dthdunhtvfozf43qitppjy

Data Ingestion using a Novel Method: H-Stream Framework

2020 International journal of recent technology and engineering  
Data ingestion tools of Hadoop are playing a key role in processing of streamed log data. With the increase of volume of the data performance of data ingestion tools goes down linearly.  ...  known as H-Stream framework.  ...  For these approaches, Streaming processing pipeline plays a key role is ingestion of data. [2] A.  ... 
doi:10.35940/ijrte.e6045.018520 fatcat:474nhdtervd4nbsqqkhwodzzwq

A Scalable Architecture for Operational FMV Exploitation

William R. Thissell, Robert Czajkowski, Frank Schrenk, Timothy Selway, Anthony J. Ries, Shamoli Patel, Patricia L. McDermott, Rod Moten, Ron Rudnicki, Guna Seetharaman, Ilker Ersoy, Kannappan Palaniappan
2015 2015 IEEE International Conference on Computer Vision Workshop (ICCVW)  
AVAA offers a new framework for video understanding at scale for large enterprise applications in the government and commercial sectors.  ...  A scalable open systems and standards derived software ecosystem is described for computer vision analytics (CVA) assisted exploitation of full motion video (FMV).  ...  This document is approved for public release, Distribution Unlimited.  ... 
doi:10.1109/iccvw.2015.139 dblp:conf/iccvw/ThissellCSSRPMM15 fatcat:jpp7pwvz2rauzn5ak7xxtyz73q

BigDataGrapes D3.2 - Data Ingestion & Integration Components

Mihalis Papakonstantinou, Timotheos Lanitis, Giannis Stoitshs
2020 Zenodo  
This accompanying document for deliverable D3.2 Data Ingestion & Integration Components describes the mechanisms and tools that will be used in the BigDataGrapes platform to ingest data of different nature  ...  Also, the document describes the tools that will be used for data integration across the different BigDataGrapes platform layers, as well as for long-term storage and preservation of data.  ...  Apache Flume 1 is a tool which has been designed specifically for ingesting stream data. Flume is distributed in nature, and its flexible architecture makes it a robust solution.  ... 
doi:10.5281/zenodo.4546037 fatcat:pvctqn6ii5fwjhjau2j7obleja

Towards a unified storage and ingestion architecture for stream processing

Ovidiu-Cristian Marcu, Alexandru Costan, Gabriel Antoniu, Maria S. Perez-Hernandez, Radu Tudoran, Stefano Bortoli, Bogdan Nicolae
2017 2017 IEEE International Conference on Big Data (Big Data)  
In this position paper, we argue for a unified ingestion and storage architecture for streaming data that addresses the aforementioned challenge.  ...  Current streaming-oriented runtimes and middlewares are not flexible enough to deal with this trend, as they address ingestion (collection and pre-processing of data streams) and persistent storage (archival  ...  Partitioning for streaming is a recognized technique used in order to increase processing throughput and scalability, e.g., [14] , [15] .  ... 
doi:10.1109/bigdata.2017.8258196 dblp:conf/bigdataconf/MarcuCAPTBN17 fatcat:w5a2fruranfflbwchhfuerqrpm

ToMaR -- A Data Generator for Large Volumes of Content

Rainer Schmidt, Matthias Rella, Sven Schlarb
2014 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing  
ToMaR specifically addresses the need for extracting data sets from large volumes of binary content based on existing, content-specific applications within a scalable data management environment.  ...  The work is motivated by scenarios for scalable content processing developed in the context of the EC project SCAPE.  ...  ACKNOWLEDGMENT Work presented in this paper is primarily supported by European Community's Seventh Framework Programme through the project SCAPE under grant agreements No 270137.  ... 
doi:10.1109/ccgrid.2014.88 dblp:conf/ccgrid/SchmidtRS14 fatcat:olnhs66avbah7m2l7clnobxpfq

A Rapid Deployment Big Data Computing Platform for Cloud Robotics

Leigh Duggan, James Dowzard, Jayantha Katupitiya, Ka C. Chan
2017 International Journal of Computer Networks & Communications  
With the ecosystem of big data technologies expanding in recent years, a review of the most relevant technologies for cloud robotics is appropriate to demonstrate and validate the proposed architectural  ...  By providing a general-purpose architecture, it is hoped that this framework will allow future research to build upon and begin to create a standardised platform, where research can be easily repeated,  ...  Lastly, the requirement for a consistent code-base for both batch and stream processing is becoming an important factor when selecting big data frameworks.  ... 
doi:10.5121/ijcnc.2017.9606 fatcat:3oxsrywvibgoxersqsbpp5xjsm

A real-time big data sentiment analysis for iraqi tweets using spark streaming

Nashwan Dheyaa Zaki, Nada Yousif Hashim, Yasmin Makki Mohialden, Mostafa Abdulghafoor Mohammed, Tole Sutikno, Ahmed Hussein Ali
2020 Bulletin of Electrical Engineering and Informatics  
The framework provides a solution to the volume of data by HDFS storage of Spark. We provide parallel data gathering nodes and parallel processing nodes for scalable stream data.  ...  This framework was proposed to gather, filter, and mine streams of data in three main phases of ingestion, processing, and visualization.  ... 
doi:10.11591/eei.v9i4.1897 fatcat:yshzohdeyjdd7db4cszerd7sn4

Robust and Scalable Entity Alignment in Big Data [article]

James Flamino, Christopher Abriola, Ben Zimmerman, Zhongheng Li, Joel Douglas
2020 arXiv   pre-print
Within this pipeline we introduce scalable feature extraction for robust temporal attributes, accompanied by novel and efficient clustering algorithms in order to find groupings of similar nodes across  ...  With the advent of big data, there is a growing need to provide analysis on graphs of massive scale.  ...  On the DS1 dataset (100K entities), we ingest a month of data at a time, compute alignments on each ingested stream, and accumulate alignments over all data streams.  ... 
arXiv:2004.08991v1 fatcat:fsq5zpot2fa2tcyrdidvrdbtr4
« Previous Showing results 1 — 15 out of 1,832 results