Filters








2,443 Hits in 4.6 sec

Towards Distributed Model Analytics with Apache Spark

Önder Babur, Loek Cleophas, Mark van den Brand
2018 Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development  
Towards Distributed Model Analytics with Apache Spark.  ...  In this paper we extend SAMOS to operate on Apache Spark, a popular engine for distributed Big Data processing, by partitioning the data and parallelizing the comparison and analysis phase.  ...  ACKNOWLEDGMENTS This work is supported by the 4TU.NIRICT Research Community Funding on Model Management and Analytics in the Netherlands.  ... 
doi:10.5220/0006735407670772 dblp:conf/modelsward/BaburCB18 fatcat:kq4punadorcp3k3m6rpw5rwsqm

BigDataGrapes D4.3 - Models and Tools for Predictive Analytics over Extremely Large Datasets

Nicola Tonellotto, Vinicius Monteiro de Lira, Franco Maria Nardini, Raffaele Perego, Cristina Muntean, Ida Mele, Salvatore Trani
2018 Zenodo  
This accompanying document for deliverable D4.3 (Models and Tools for Predictive Analytics over Extremely Large Datasets) describes the first version of the mechanisms and tools supporting efficient and  ...  The document details the steps to be followed to download and deploy the first version of the BDG platform and provides the reader with practical examples of usage of its scalable predictive analytics  ...  Sparkling Water and Apache Spark allows for a seamless experience for users who want to interact with distributed databases/filesystems, build a model and make predictions, and then use the results again  ... 
doi:10.5281/zenodo.1481800 fatcat:rlqwgvajzre6pfxuiiclmk2r34

Apache Spark usage and deployment models for scientific computing

Diogo Castro, Prasanth Kothuri, Piotr Mrowczynski, Danilo Piparo, Enric Tejedor, A. Forti, L. Betev, M. Litmaath, O. Smirnova, P. Hristov
2019 EPJ Web of Conferences  
Among many frameworks, Apache Spark is currently getting the most traction from various user communities and new ways to deploy Spark such as Apache Mesos or Spark on Kubernetes have started to evolve  ...  The second part of the talk touches upon the evolution of the Apache Spark data analytics platform, particularly sharing the recent work done to run Spark on Kubernetes on the virtualized and container-based  ...  This makes it a truly elastic unified data analysis platform with efficient usage of CERN computing resources.  ... 
doi:10.1051/epjconf/201921407020 fatcat:nd3s4cqnjzc3babezy5xjsghum

Real-time Text Analytics Pipeline Using Open-source Big Data Tools [article]

Hassan Nazeer, Waheed Iqbal, Fawaz Bokhari, Faisal Bukhari, Shuja Ur Rehman Baig
2017 arXiv   pre-print
Our proposed data processing pipeline is based on Apache Kafka for data ingestion, Apache Spark for in-memory data processing, Apache Cassandra for storing processed results, and D3 JavaScript library  ...  Distributed deploy- ment with 2 instances Deployed Apache Spark, Apache Kafka and Apache Cassandra as a cluster using two virtual machines. 3.  ...  Then Apache Spark consumes the data and performs predictive analytics using Spark's MLib module [11] .  ... 
arXiv:1712.04344v1 fatcat:gwy3fupgurfengu2iz6l4j3nyy

Approximate Stream Analytics in Apache Flink and Apache Spark Streaming [article]

Do Le Quoc, Ruichuan Chen, Pramod Bhatotia, Christof Fetze, Volker Hilt, Thorsten Strufe
2017 arXiv   pre-print
Our results show that Spark- and Flink-based StreamApprox systems achieve a speedup of 1.15×-3× compared to the respective native Spark Streaming and Flink executions, with varying sampling fraction of  ...  Furthermore, we have also implemented an improved baseline in addition to the native execution baseline - a Spark-based approximate computing system leveraging the existing sampling modules in Apache Spark  ...  For a fair comparison with the sampling algorithms available in Apache Spark, we also built an Apache Spark-based approximate computing system for stream analytics (as described in §4).  ... 
arXiv:1709.02946v1 fatcat:ejcj5eugkjbxpo5bet22dhv7jq

StreamApprox

Do Le Quoc, Ruichuan Chen, Pramod Bhatotia, Christof Fetzer, Volker Hilt, Thorsten Strufe
2017 Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference on - Middleware '17  
Our results show that Spark-and Flink-based StreamApprox systems achieve a speedup of 1.15×-3× compared to the respective native Spark Streaming and Flink executions, with varying sampling fraction of  ...  To showcase the effectiveness of our algorithm, we implemented StreamApprox as a fully functional prototype based on Apache Spark Streaming and Apache Flink.  ...  For a fair comparison with the sampling algorithms available in Apache Spark, we also built an Apache Spark-based approximate computing system for stream analytics (as described in §4).  ... 
doi:10.1145/3135974.3135989 dblp:conf/middleware/QuocCBFHS17 fatcat:2zlds3w2uzbzvdga6426whk474

A NoSQL Approach for Aspect Mining of Cultural Heritage Streaming Data

Gerasimos Vonitsanos, Andreas Kanavos, Alaa Mohasseb, Dimitrios Tsolis
2019 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)  
With the advent of social media, there is a data abundance so that analytics can be reliably designed for ultimately providing valuable information towards a given product or service.  ...  In this paper, we present a NoSQL database approach for aspect mining of a cultural heritage scenario by taking advantage of Apache Spark streaming architecture.  ...  A framework to render advanced data analytics, i.e., Apache Spark, along with a NoSQL database for handling large portions of data, i.e., Apache Cassandra, provide highly available service with no single  ... 
doi:10.1109/iisa.2019.8900770 dblp:conf/iisa/VonitsanosKMT19 fatcat:vqrlkkrxtbdwjd7ebchbreypgq

BigDataGrapes D4.3 - Models and Tools for Predictive Analytics over Extremely Large Datasets

Nicola Tonellotto, Vinicius Monteiro de Lira, Franco Maria Nardini, Raffaele Perego, Cristina Muntean, Ida Mele, Salvatore Trani, Matteo Ceneta
2019 Zenodo  
This accompanying document for deliverable D4.3 (Models and Tools for Predictive Analytics over Extremely Large Datasets) describes the first version of the mechanisms and tools supporting efficient and  ...  The document details the steps to be followed to download and deploy the first version of the BDG platform and provides the reader with practical examples of usage of its scalable predictive analytics  ...  Spark enabling distributed computation ........................................ 13 3.4.2 Reading of the data with Apache Spark (to create a Resilient Distributed Dataset (RDD) .............. 14 3.4.3  ... 
doi:10.5281/zenodo.2641952 fatcat:n6ag6qt4gzg6tmnytqs2f7op4u

D3.13 5G Security Framework (Release A)

5GENESIS
2019 Zenodo  
In this context, among its other features, 5GENESIS includes a Security Analytics platform as a contribution towards hardening the security of next-generation networks.  ...  New vulnerabilities are identified, in particular associated with the capabilities related to network softwarisation and slicing.  ...  Apache Spark Apache Spark [12] is a distributed and highly scalable cluster -computing framework.  ... 
doi:10.5281/zenodo.5615194 fatcat:wnl5cmmgiveflcyopntbwedh7u

Enabling Big Data Analytics at Manufacturing Fields of Farplas Automotive [article]

Ozgun Akin, Halil Faruk Deniz, Dogukan Nefis, Alp Kiziltan, Altan Cakir
2020 arXiv   pre-print
Apache Kafka and Apache Spark implementation on Apache Ha-doop cluster, and identifying the challenges and issues occurring with implementation the Farplas manufacturing company, which is one of the biggest  ...  The term Industry 4.0 stands for today industrial digitization which is defined as a new level of organization and control over the entire value chain of the life cycle of products; it is geared towards  ...  All this installation environment is coordinated to work together, like a Hadoop ecosystem together with an In-Memory-Flow approach with Apache Spark.  ... 
arXiv:2004.11682v1 fatcat:vwek3gfpargkzcgtpktz6p2p3m

Towards machine learning-based self-tuning of Hadoop-Spark system

Md. Armanur Rahman, Abid Hossen, J. Hossen, Venkataseshaiah C, Thangavel Bhuvaneswari, Aziza Sultana
2019 Indonesian Journal of Electrical Engineering and Computer Science  
Apache Spark is an open source distributed platform which uses the concept of distributed memory for processing big data. Spark has more than 180 predominant configuration parameter.  ...  A comparison is provided to highlight the experimented result of the proposed approach with default Spark configuration system.  ...  A simulation driven prediction model to estimate job performance with high perfection for Apache Spark is presented in [19] .  ... 
doi:10.11591/ijeecs.v15.i2.pp1076-1085 fatcat:f3svmltkcrfzfhjucmk5mjiz5e

Security analytics of large scale streaming data [article]

Sheeraz Niaz Lighari
2018 The PhD Series of the Faculty of Engineering and Science  
The detection of unknown attacks using big data analytics is relatively new and advanced approach towards the big data security analytics.  ...  This model performs anomaly detection on latency values produced by the sensors. APACHE SPARK Apache spark is an open source big data project.  ... 
doi:10.5278/vbn.phd.eng.00047 fatcat:mewllh4hjzh3bjunk47gwkwa2q

Implementing a Volunteer Notification System into a Scalable, Analytical Realtime Data Processing Environment [chapter]

Jesko Elsner, Tomas Sivicki, Philipp Meisen, Tobias Meisen, Sabina Jeschke
2016 Automation, Communication and Cybernetics in Science and Engineering 2015/2016  
The pace at which next-generation Internet of Things networks, consisting of wirelessly distributed sensors and devices, are being developed is speeding up.  ...  This paper will focus on a basic concept for implementing a VNS approach into a scalable, fault-tolerant environment that uses state-of-the-art analytical tools to process information streams in real-time  ...  CONCLUSION This work illustrated details on how to implement a VNS into a distributed analytical environment with high velocity data support.  ... 
doi:10.1007/978-3-319-42620-4_64 fatcat:pd2nu6qe5ffb3m76hoctaimfse

High Performance Data Engineering Everywhere [article]

Chathura Widanage, Niranda Perera, Vibhatha Abeykoon, Supun Kamburugamuve, Thejaka Amila Kanewala, Hasara Maithree, Pulasthi Wickramasinghe, Ahmet Uyar, Gurhan Gunduz, Geoffrey Fox
2020 arXiv   pre-print
Initial experiments show that Cylon enhances popular tools such as Apache Spark and Dask with major performance improvements for key operations and better component linkages.  ...  All this demands an efficient and highly distributed integrated approach for data processing, yet many of today's popular data analytics tools are unable to satisfy all these requirements at the same time  ...  We thank Intel for their use of the Juliet and Victor systems, and extend our gratitude to the FutureSystems team for their support with the infrastructure.  ... 
arXiv:2007.09589v1 fatcat:5qm4d5e4ajhltkpxbk2z57nxii

A real-time big data sentiment analysis for iraqi tweets using spark streaming

Nashwan Dheyaa Zaki, Nada Yousif Hashim, Yasmin Makki Mohialden, Mostafa Abdulghafoor Mohammed, Tole Sutikno, Ahmed Hussein Ali
2020 Bulletin of Electrical Engineering and Informatics  
SPARK STREAMING Apache Spark is an open source framework which consists of an engine for programs distribution across machine clusters and a sophisticated model for writing programs [31] [32] [33] ment  ...  Spark 2.0 gives Spark users the leverage of not having to be having a direct interaction with RDD, but it is important to provide them with the robust mental model of the concept of RDD.  ... 
doi:10.11591/eei.v9i4.1897 fatcat:yshzohdeyjdd7db4cszerd7sn4
« Previous Showing results 1 — 15 out of 2,443 results