A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Scalable and Robust Framework for Data Stream Ingestion
[article]
2018
arXiv
pre-print
This paper investigates the fundamental requirements and the state of the art of existing data stream ingestion systems, propose a scalable and fault-tolerant data stream ingestion and integration framework ...
that can serve as a reusable component across many feeds of structured and unstructured input data in a given platform, and demonstrate the utility of the framework in a real-world data stream processing ...
ACKNOWLEDGMENT Special thanks to Southern Ontario Smart Computing for Innovation Platform (SOSCIP) and IBM Canada for supporting this research project. ...
arXiv:1812.04197v1
fatcat:freh5fgeu5ezhbfi5lutmc6smy
BigDataGrapes D3.2 - Data Ingestion & Integration Components
2018
Zenodo
This accompanying document for deliverable D3.2 Data Ingestion & Integration Components describes the mechanisms and tools that will be used in the BigDataGrapes platform to ingest data of different nature ...
Also, the document describes the tools that will be used for data integration across the different BigDataGrapes platform layers, as well as for long-term storage and preservation of data. ...
Apache Flume 1 is a tool which has been designed specifically for ingesting stream data. Flume is distributed in nature, and its flexible architecture makes it a robust solution. ...
doi:10.5281/zenodo.1482750
fatcat:p3i7hezygvbhhfod55rb34ss2y
Big Data Real Time Ingestion and Machine Learning
2018
2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP)
Rather waiting for data to be collected as a whole at a long periodic interval, streaming analysis let us identify patterns -and make decisions based on them -as data start arriving. ...
Data arrives in all shapes and sizes. Many time data are acquired sequentially -as an infinite ever growing stream. ...
With respect to ingestion, Apache Kafka and Flume are the two popular choices for high velocity Big Data ingestion due to their horizontal scalability and robust failover. ...
doi:10.1109/dsmp.2018.8478598
fatcat:e5emr6hxrffvlgxcvxloxtwudm
BigDataGrapes D2.3 - BigDataGrapes Software Stack Design
2021
Zenodo
Deliverable D2.3 is the first submitted iteration of a living document that will describe the components and architecture of the BigDataGrapes software stack, providing design principles and technical ...
To this end, the specifications document is treated as a living document, with regular submission to the EC of versions that report on significant changes in design and functionality. ...
-ApacheFlink is an open-source stream processing framework for distributed, high-performing, alwaysavailable, and accurate data streaming applications. ...
doi:10.5281/zenodo.4546026
fatcat:a35e4prb5ffqjmsammsac4e52q
BigDataGrapes D2.3 - BigDataGrapes Software Stack Design
2019
Zenodo
Deliverable D2.3 is the first submitted iteration of a living document that will describe the components and architecture of the BigDataGrapes software stack, providing design principles and technical ...
To this end, the specifications document is treated as a living document, with regular submission to the EC of versions that report on significant changes in design and functionality. ...
-ApacheFlink is an open-source stream processing framework for distributed, high-performing, alwaysavailable, and accurate data streaming applications. ...
doi:10.5281/zenodo.3269261
fatcat:7czrc4pqb5hn5kuwh34ytuzmjq
ISTHMUS: Secure, Scalable, Real-time and Robust Machine Learning Platform for Healthcare
[article]
2019
arXiv
pre-print
data platform for inferring population as well as patient level insights for Social Determinants of Health (SDoH), and (3) ingesting live-streaming data from various IoT sensors to build models, which ...
fit for healthcare due to complexities in handling data quality issues, mandates to demonstrate clinical relevance, and a lack of ability to monitor performance in a highly regulated environment with ...
Finally, we would like to thank everyone on the Isthmus team for making Isthmus a success.
Disclaimer Trademark products mentioned in this paper are properties of respective trademark owners. ...
arXiv:1909.13343v2
fatcat:iis6r37t55davmvag4g2sxo6pq
BigDataGrapes D2.3 - BigDataGrapes Software Stack Design
[article]
2018
Zenodo
Deliverable D2.3 is the first submitted iteration of a living document that will describe the components and architecture of the BigDataGrapes software stack, providing design principles and technical ...
To this end, the specifications document is treated as a living document, with regular submission to the EC of versions that report on significant changes in design and functionality. ...
A Java-based ingestion tool, Flume is used when input data streams-in faster than it can be consumed. Typically, Flume is used to ingest streaming data into HDFS. ...
doi:10.5281/zenodo.1308602
fatcat:lwb4dthdunhtvfozf43qitppjy
Data Ingestion using a Novel Method: H-Stream Framework
2020
International journal of recent technology and engineering
Data ingestion tools of Hadoop are playing a key role in processing of streamed log data. With the increase of volume of the data performance of data ingestion tools goes down linearly. ...
known as H-Stream framework. ...
For these approaches, Streaming processing pipeline plays a key role is ingestion of data. [2] A. ...
doi:10.35940/ijrte.e6045.018520
fatcat:474nhdtervd4nbsqqkhwodzzwq
A Scalable Architecture for Operational FMV Exploitation
2015
2015 IEEE International Conference on Computer Vision Workshop (ICCVW)
AVAA offers a new framework for video understanding at scale for large enterprise applications in the government and commercial sectors. ...
A scalable open systems and standards derived software ecosystem is described for computer vision analytics (CVA) assisted exploitation of full motion video (FMV). ...
This document is approved for public release, Distribution Unlimited. ...
doi:10.1109/iccvw.2015.139
dblp:conf/iccvw/ThissellCSSRPMM15
fatcat:jpp7pwvz2rauzn5ak7xxtyz73q
BigDataGrapes D3.2 - Data Ingestion & Integration Components
2020
Zenodo
This accompanying document for deliverable D3.2 Data Ingestion & Integration Components describes the mechanisms and tools that will be used in the BigDataGrapes platform to ingest data of different nature ...
Also, the document describes the tools that will be used for data integration across the different BigDataGrapes platform layers, as well as for long-term storage and preservation of data. ...
Apache Flume 1 is a tool which has been designed specifically for ingesting stream data. Flume is distributed in nature, and its flexible architecture makes it a robust solution. ...
doi:10.5281/zenodo.4546037
fatcat:pvctqn6ii5fwjhjau2j7obleja
Towards a unified storage and ingestion architecture for stream processing
2017
2017 IEEE International Conference on Big Data (Big Data)
In this position paper, we argue for a unified ingestion and storage architecture for streaming data that addresses the aforementioned challenge. ...
Current streaming-oriented runtimes and middlewares are not flexible enough to deal with this trend, as they address ingestion (collection and pre-processing of data streams) and persistent storage (archival ...
Partitioning for streaming is a recognized technique used in order to increase processing throughput and scalability, e.g., [14] , [15] . ...
doi:10.1109/bigdata.2017.8258196
dblp:conf/bigdataconf/MarcuCAPTBN17
fatcat:w5a2fruranfflbwchhfuerqrpm
ToMaR -- A Data Generator for Large Volumes of Content
2014
2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
ToMaR specifically addresses the need for extracting data sets from large volumes of binary content based on existing, content-specific applications within a scalable data management environment. ...
The work is motivated by scenarios for scalable content processing developed in the context of the EC project SCAPE. ...
ACKNOWLEDGMENT Work presented in this paper is primarily supported by European Community's Seventh Framework Programme through the project SCAPE under grant agreements No 270137. ...
doi:10.1109/ccgrid.2014.88
dblp:conf/ccgrid/SchmidtRS14
fatcat:olnhs66avbah7m2l7clnobxpfq
A Rapid Deployment Big Data Computing Platform for Cloud Robotics
2017
International Journal of Computer Networks & Communications
With the ecosystem of big data technologies expanding in recent years, a review of the most relevant technologies for cloud robotics is appropriate to demonstrate and validate the proposed architectural ...
By providing a general-purpose architecture, it is hoped that this framework will allow future research to build upon and begin to create a standardised platform, where research can be easily repeated, ...
Lastly, the requirement for a consistent code-base for both batch and stream processing is becoming an important factor when selecting big data frameworks. ...
doi:10.5121/ijcnc.2017.9606
fatcat:3oxsrywvibgoxersqsbpp5xjsm
A real-time big data sentiment analysis for iraqi tweets using spark streaming
2020
Bulletin of Electrical Engineering and Informatics
The framework provides a solution to the volume of data by HDFS storage of Spark. We provide parallel data gathering nodes and parallel processing nodes for scalable stream data. ...
This framework was proposed to gather, filter, and mine streams of data in three main phases of ingestion, processing, and visualization. ...
doi:10.11591/eei.v9i4.1897
fatcat:yshzohdeyjdd7db4cszerd7sn4
Robust and Scalable Entity Alignment in Big Data
[article]
2020
arXiv
pre-print
Within this pipeline we introduce scalable feature extraction for robust temporal attributes, accompanied by novel and efficient clustering algorithms in order to find groupings of similar nodes across ...
With the advent of big data, there is a growing need to provide analysis on graphs of massive scale. ...
On the DS1 dataset (100K entities), we ingest a month of data at a time, compute alignments on each ingested stream, and accumulate alignments over all data streams. ...
arXiv:2004.08991v1
fatcat:fsq5zpot2fa2tcyrdidvrdbtr4
« Previous
Showing results 1 — 15 out of 1,832 results