Filters








789 Hits in 7.4 sec

Usages of Spark Framework with Different Machine Learning Algorithms

Mohamed Ali Mohamed, Ibrahim Mahmoud El-henawy, Ahmad Salah, Ahmed Mostafa Khalil
2021 Computational Intelligence and Neuroscience  
As a consequence of this exponential growth of data, a new term and idea known as big data have been coined.  ...  This article focuses on three machine learning types, including regression, classification, and clustering, and how they can be applied on top of the Spark platform.  ...  RDD provides data analysis activities, such as transformation and manipulation of data, which may be utilized via the use of additional Spark libraries and tools.  ... 
doi:10.1155/2021/1896953 fatcat:y3bkzwmtt5cfnmscww33qvydiu

A Comparative Study on Big Data Analytics Frameworks, Data Resources and Challenges

Flasteen Abuqabita, Razan Al-Omoush, Jaber Alwidian
2019 Modern Applied Science  
In this study we categorized the existing frameworks which is used for processing the big data into three groups, namely as, Batch processing, Stream analytics and Interactive analytics, we discussed each  ...  of them in detailed and made comparison on each of them.  ...  which is library used for complex event processing and pattern detection, and continuous event stream analysis, those libraries are not fully self-contained on flink core rather it embed on API,flink  ... 
doi:10.5539/mas.v13n7p1 fatcat:74icluidsnbd7koqximzjxzfii

BigDataGrapes D4.3 - Models and Tools for Predictive Analytics over Extremely Large Datasets

Nicola Tonellotto, Vinicius Monteiro de Lira, Franco Maria Nardini, Raffaele Perego, Cristina Muntean, Ida Mele, Salvatore Trani
2018 Zenodo  
On top of this stack, the BDG platform enables distributed predictive big data analytics by effectively exploiting scalable Machine Learning algorithms using efficiently the computational resources of  ...  They thus include everything needed to run the supported predictive data analytics tools on any system that can run a Docker engine.  ...  We finally introduce MLLib, the machine learning library working on Apache Spark that allows distributed and parallel learning from big data of effective models for regression and classification tasks.  ... 
doi:10.5281/zenodo.1481800 fatcat:rlqwgvajzre6pfxuiiclmk2r34

Big Data Mining: Tools & Algorithms

Adeel Shiraz Hashmi, Tanvir Ahmad
2016 International Journal of Recent Contributions from Engineering, Science & IT  
The data mining tools and algorithms which can handle big data have also been summarized, and one of the tools has been used for mining of large datasets using distributed algorithms.  ...  We are now in Big Data era, and there is a growing demand for tools which can process and analyze it.  ...  Apache Spark can also handle data streams through Spark Streaming library.  ... 
doi:10.3991/ijes.v4i1.5350 fatcat:ezumvjhrhnbfbamy6xlgz7dztm

Parallel processing on Big Data in the context of Machine Learning and Hadoop Ecosystem: A Survey

Anilkumar Vishwanath Brahmane1, R Murugan
2018 International Journal of Engineering & Technology  
On the other hand, in Big Data perspective, customary information methods and policies are not as much of capable.  ...  To solve the composite Big Data constraints and difficulties, a large amount effort has been carried out.  ...  It is a recognized analytics platform that ensures a fast, easy-to-use and flexible computing. Spark handles complex analysis on large data sets.  ... 
doi:10.14419/ijet.v7i2.7.10885 fatcat:goyvvzlwsbeifi62nrldkgp3yy

Big Data in Smart City: Management Challenges

Mladen Amović, Miro Govedarica, Aleksandra Radulović, Ivana Janković
2021 Applied Sciences  
In this paper, we suggest the biG dAta sMart cIty maNagEment SyStem (GAMINESS) that is based on the Apache Spark big data framework.  ...  The developed model has the ability to exchange data regardless of the used standard or the data format into proposed Apache Spark data framework schema.  ...  Data Availability Statement: Not applicable. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/app11104557 fatcat:me2gyg3lmvdhnn4yo4g3gkbuuq

Developing a monitoring system for Cloud-based distributed data-centers

Domenico Elia, Gioacchino Vino, Giacinto Donvito, Marica Antonacci, A. Forti, L. Betev, M. Litmaath, O. Smirnova, P. Hristov
2019 EPJ Web of Conferences  
Grafana and Kibana are used to show data in a dedicated dashboards. The Root-cause analysis engine has been implemented using custom machine learning algorithms.  ...  Apache Spark has been selected as analysis component.  ...  The event extractor algorithm is an incremental Self-Organizing Map that associates points with similar behavior to the same label and marks each label with a critical score: a point belonging to a label  ... 
doi:10.1051/epjconf/201921408012 fatcat:eedsbgog4zf7bgmun3jarxtnda

A Survey on Some Big Data Applications Tools and Technologies

Nazia Tazeen, Sandhya Rani K.
2021 International journal of recent technology and engineering  
Apache Hadoop software is a store of accessible source programs to store big data and perform analytics and various other operations related to big data.  ...  Big Data Applications, tools and technologies used to handle it are briefly discussed in this paper.  ...  HDFS is used to store huge data files, which are high to store on a single machine typically in gigabyte to terabyte.  ... 
doi:10.35940/ijrte.f5575.039621 fatcat:zf35wmnfbzcqtdtjhobgyvfkpq

Big data clustering techniques based on Spark: a literature review

Mozamel M. Saeed, Zaher Al Aghbari, Mohammed Alsharidah
2020 PeerJ Computer Science  
A popular unsupervised learning method, known as clustering, is extensively used in data mining, machine learning and pattern recognition.  ...  Therefore, this survey aims to present a comprehensive summary of the previous studies in the field of Big Data clustering using Apache Spark during the span of 2010–2020.  ...  The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.  ... 
doi:10.7717/peerj-cs.321 pmid:33816971 pmcid:PMC7924475 fatcat:hmpihu6qvncsbammunbkmlybma

A smart method for spark using neural network for big data

Md. Armanur Rahman, J. Hossen, Aziza Sultana, Abdullah Al Mamun, Nor Azlina Ab. Aziz
2021 International Journal of Power Electronics and Drive Systems (IJPEDS)  
Apache spark, famously known for big data handling ability, is a distributed open-source framework that utilizes the idea of distributed memory to process big data.  ...  This paper proposes a more effective, self-tuning approach subject to a neural network called Smart method for spark using neural network for big data (SSNNB) to avoid the disadvantages of manual tuning  ...  The authors also would like to acknowledge the anonymous reviewers for their valuable comments and insights.  ... 
doi:10.11591/ijece.v11i3.pp2525-2534 fatcat:gla6j5t5ozd3bfuq6eivxuroee

BigDataGrapes D4.3 - Models and Tools for Predictive Analytics over Extremely Large Datasets

Nicola Tonellotto, Vinicius Monteiro de Lira, Franco Maria Nardini, Raffaele Perego, Cristina Muntean, Ida Mele, Salvatore Trani, Matteo Ceneta
2019 Zenodo  
On top of this stack, the BDG platform enables distributed predictive big data analytics by effectively exploiting scalable Machine Learning algorithms using the computational resources of the underlying  ...  They thus include everything needed to run the supported predictive data analytics tools on any system that can run a Docker engine.  ...  We finally introduce MLLib, the machine learning library working on Apache Spark that allows distributed and parallel learning from big data of effective models for regression and classification tasks.  ... 
doi:10.5281/zenodo.2641952 fatcat:n6ag6qt4gzg6tmnytqs2f7op4u

Architecture of a Compact Data GRID Cluster for Teaching Modern Methods of Data Mining in the Virtual Computer Lab

Mikhail Belov, Vladimir Korenkov, Nadezhda Tokareva, Eugenia Cheremisina, Gh. Adam, J. Buša, M. Hnatič
2020 EPJ Web of Conferences  
This paper discusses the architecture of a compact Data GRID cluster for teaching new methods of Big Data analytics in the Virtual Computer Lab.  ...  based on these data.  ...  Apache Spark is a distributed computing framework that makes Big Data processing easy, fast, and scalable.  ... 
doi:10.1051/epjconf/202022603004 fatcat:rsoyqv4tzbacnmnozn3irwdjrq

A Survey on Spark Ecosystem for Big Data Processing [article]

Shanjiang Tang, Bingsheng He, Ce Yu, Yusen Li, Kun Li
2018 arXiv   pre-print
With the explosive increase of big data in industry and academic fields, it is necessary to apply large-scale data processing systems to analysis Big Data.  ...  Finally, we make a discussion on the open issues and challenges for large-scale in-memory data processing with Spark.  ...  In the numeric analysis and machine learning domains, R [39] is a popular programming language widely used by data scientists for statistical computing and data analysis.  ... 
arXiv:1811.08834v1 fatcat:6fxvg6me7rayzm4suoabyg7fii

Smart Grid Big Data Analytics: Survey of Technologies, Techniques, and Applications

Dabeeruddin Syed, Ameema Zainab, Shady S. Refaat, Haitham Abu-Rub, Othmane Bouhali
2020 IEEE Access  
Hence, the triumph of the smart grid energy paradigm depends on the factor of big data analytics.  ...  Such transformation is linked to adding a large number of smart meters and other sources of information extraction units. This provides various opportunities associated with the collected big data.  ...  MLib: Apache Spark comprises a library with common machine learning functionality and this library is called MLib.  ... 
doi:10.1109/access.2020.3041178 fatcat:awgtqx6nordadbtjn2a4v4nxe4

BigDataGrapes D2.3 - BigDataGrapes Software Stack Design

Panagis Katsivelis
2021 Zenodo  
The current version of the specification is based on the preliminary analysis of the use cases provided by the three BigDataGrapes industrial partners, covering the basic functional and non-functional  ...  To this end, the specifications document is treated as a living document, with regular submission to the EC of versions that report on significant changes in design and functionality.  ...  Spark is a lightning-fast unified analytics engine for big data and machine learning. Compare to Hadoop, both Hadoop and Apache Spark are big-data frameworks, but they serve for different purposes.  ... 
doi:10.5281/zenodo.4546026 fatcat:a35e4prb5ffqjmsammsac4e52q
« Previous Showing results 1 — 15 out of 789 results