Filters








139 Hits in 2.6 sec

BigData Analysis in Healthcare: Apache Hadoop , Apache spark and Apache Flink

Elham Nazari, Mohammad Hasan Shahriari, Hamed Tabesh
2019 Frontiers in Health Informatics  
works for slow complex analyzes and does not support flow processing, Apache Spark is also distributed as a computational platform that can process a big data set in memory with a very fast response time  ...  Overall, the findings showed that the Apache Hadoop environment has simplicity, error detection, and scalability management based on clusters, but because its processing is based on batch processing, it  ...  The same flake with Spark can do structured query language (SQL), graph, machine learning and stream processing.  ... 
doi:10.30699/fhi.v8i1.180 fatcat:bmj24xxzbffnnhylvl7er6cytu

A comparison on scalability for batch big data processing on Apache Spark and Apache Flink

Diego García-Gil, Sergio Ramírez-Gallego, Salvador García, Francisco Herrera
2017 Big Data Analytics  
Recently a novel framework called Apache Flink has emerged, focused on distributed stream and batch data processing.  ...  In this paper we perform a comparative study on the scalability of these two frameworks using the corresponding Machine Learning libraries for batch data processing.  ...  The first one was Mahout [8] (as part of Apache Hadoop [3] ), followed by MLlib [9] which is part of Spark project [5] .  ... 
doi:10.1186/s41044-016-0020-2 fatcat:b6uqpjj7nfei7lckkafbrdktpi

Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service [article]

Radwa Elshawi, Sherif Sakr
2017 arXiv   pre-print
Samsara compiles, optimizes and executes its programs on distributed dataflow systems (e.g., Apache Spark , Apache Flink, H2O).  ...  Apache Mahout [36] is an open-source toolkit which is designed to solve very practical and scalable machine learning problems on top of the Hadoop platform.  ... 
arXiv:1709.07493v1 fatcat:ja7rrbfk7vhnpjvqadlr5vvhyy

Big Data Application Performance Monitoring in Retail E-Commerce using Spark
English

Lavanya Marasa, Kalyani Kunchum
2017 International Journal of Engineering Trends and Technoloy  
, "how similar are they to one another" and "what else might they be interested in viewing?".  ...  Apache Spark, the trendy big data processing engine that offers faster solutions for any failures compared to Hadoop, can be effectively utilized in finding patterns of relevance useful for the common  ...  Being a batch processing system, Hadoop users have to depend on other platforms like Storm for real time data processing, Mahout for machine learning or Graph for graph processing.  ... 
doi:10.14445/22315381/ijett-v50p211 fatcat:kjhsay6ogfc6de5zr5gglq735e

Performance Comparison of a Parallel Recommender Algorithm Across Three Hadoop-Based Frameworks

Christina Diedhiou, Bryan Carpenter, Aamir Shafi, Soumabha Sarkar, Ramazan Esmeli, Ryan Gadsdon
2018 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)  
On the human side, one of the aids to finding the things people really want is recommendation systems.  ...  We benchmark the performance and demonstrate parallel speedup on Movielens and Yahoo Music data sets, comparing our results with two other frameworks: Mahout and Spark.  ...  The ALSWR implementation with Apache Mahout is done through its machine learning library and more specifically the map-reduce implementation of ALS.  ... 
doi:10.1109/cahpc.2018.8645926 dblp:conf/sbac-pad/DiedhiouCSSEG18 fatcat:oi5zqr677rdp7edkytqglctnnm

A Review on Latest Technologies in Big Data Analysis

Castro S, Pushpalakshmi R
2018 International Journal of Engineering & Technology  
Thus, the recent researches have focused on the analysis of big data.  ...  In this digital world, the modern information systems have produced a large amount of data which needs huge depositary in terms of terabytes for storage.  ...  Apache Mahout: The main objective of apache mahout is to present a commercial based and scalable machine learning strategies for the applications of intelligent and wide scale data analysis.  ... 
doi:10.14419/ijet.v7i3.1.16806 fatcat:fvxtqfgysrhtbiakgnvowceyhm

Identifying Requirements for Big Data Analytics and Mapping to Hadoop Tools

2019 International journal of recent technology and engineering  
Also, for each identified category, comparison of Hadoop tools based on important parameters is presented.  ...  The tools have been thoroughly studied and analyzed based on their suitability for the different requirements of big data analytics.  ...  learning applications Apache Mahout is a software library of scalable machine learning algorithms, implemented on the top of Apache Hadoop, once data is stored on HDFS [17].  ... 
doi:10.35940/ijrte.c5524.098319 fatcat:zgw5y6nucve3jo36sqeio3wukq

Hadoop Ecosystem: An Introduction

2016 International Journal of Science and Research (IJSR)  
Hadoop a de facto industry standard has become kernel of the distributed operating system for Big data.  ...  But, No one uses kernel alone. "Hadoop" is taken to be a combination of HDFS and MapReduce.  ...  Apache Mahout Apache Mahout [9] [14] is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily  ... 
doi:10.21275/v5i6.nov164121 fatcat:fep6wrhjdfcknjghm3nnvdqmnq

A Survey on Big Data Analytics: Challenges, Open Research Issues and Tools

D. P., Kauser Ahmed
2016 International Journal of Advanced Computer Science and Applications  
Additionally, it opens a new horizon for researchers to develop the solution, based on the challenges and open research issues.  ...  Social computing includes social network analysis, online communities, recommender systems, reputation systems, and prediction markets where as internet search indexing includes ISI, IEEE Xplorer, Scopus  ...  Apache Mahout Apache mahout aims to provide scalable and commercial machine learning techniques for large scale and intelligent data analysis applications.  ... 
doi:10.14569/ijacsa.2016.070267 fatcat:6g2xv2q4ijcvpgikxjzomgjc5a

A survey of open source tools for machine learning with big data in the Hadoop ecosystem

Sara Landset, Taghi M. Khoshgoftaar, Aaron N. Richter, Tawfiq Hasanin
2015 Journal of Big Data  
We then look at machine learning libraries and frameworks including Mahout, MLlib, SAMOA, and evaluate them based on criteria such as scalability, ease of use, and extensibility.  ...  The world's data is growing rapidly, and traditional tools for machine learning are becoming insufficient as we move towards distributed and real-time processing.  ...  In a comparison between MLI (an API for distributed machine learning built on Spark), GraphLab, Mahout, and MATLAB of collaborative filtering with alternating least squares [106] , it was observed that  ... 
doi:10.1186/s40537-015-0032-1 fatcat:zgcsiokrynfhzbmaudqf7rcll4

Experience report: A characteristic study on out of memory errors in distributed data-parallel applications

Lijie Xu, Wensheng Dou, Feng Zhu, Chushu Gao, Jie Liu, Hua Zhong, Jun Wei
2015 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE)  
This paper presents a comprehensive characteristic study on 123 real-world OOM errors in Hadoop and Spark applications.  ...  Out of memory (OOM) errors occur frequently in data-intensive applications that run atop distributed dataparallel frameworks, such as MapReduce and Spark.  ...  ., 3 input splits in Fig. 1 ), and stored on the distributed file system.  ... 
doi:10.1109/issre.2015.7381844 dblp:conf/issre/XuDZGLZW15 fatcat:tst2cqbngrhxxlc5bxfmbtrpyu

Big Data Analytics = Machine Learning + Cloud Computing [chapter]

C. Wu, R. Buyya, K. Ramamohanarao
2016 Big Data  
Mahout The original object of Mahout was to build a Java-based machine learning library that covers all machine learning algorithms or techniques in theory but it can mainly handle three types of machine  ...  Since all the algorithms of both Lucene and Mahout are closely associated with the concept of machine learning, In Apr-2010, Mahout has risen as a top level project in its own right.  ...  ML + CC  BDA and Guidelines We discussed the role of machine learning (ML), Cloud Computing (CC), and Hadoop like systems. We see that ML and CC are the two most important components of BDA.  ... 
doi:10.1016/b978-0-12-805394-2.00001-5 fatcat:2a2avnxwivbztmp7iksxqgkv2a

An efficient technique to improve resources utilization for hadoop MapReduce in heterogeneous system

Ahmed Qasim Mohammed, Rajesh Bharati
2017 2017 International Conference on Intelligent Communication and Computational Techniques (ICCT)  
Oughties witness releasing one of the most  ...  -6: shows the dataflow in our system For our experiments we configured a cluster of 3 nodes working on centos operating system with stable version of Hadoop on each node.  ...  MapReduce Architecture MapReduce it is one of the most powerful tool to process Big Data in parallel but it process rest data, originally MapReduce [6] designed to process data in distributed system  ... 
doi:10.1109/intelcct.2017.8324012 fatcat:6wvscll2f5cg3innhn6olctyly

Big Data Analytics: A Perspective View

Suman Pandey
2017 International Journal of Advanced Research in Computer Science and Software Engineering  
This paper studies the content, scope, methods, advantages and challenges of big data and also discusses privacy issue concern on it.  ...  the job, invoking user defined policies and dynamically updating the job graph. 3) Apache Mahout The Apache Mahout [39] is scalable and commercial machine learning technique for vast unstructured  ...  Therefore it is used in interactive operation system, real-time analytics, on-line machine learning, distributed RPC continuous computation, and ETL.  ... 
doi:10.23956/ijarcsse/sv7i5/0237 fatcat:mq75vo3n4rbihnc43mtpktzrru

Scaling data mining in massively parallel dataflow systems

Sebastian Schelter
2014 Proceedings of the 2014 SIGMOD PhD symposium on - SIGMOD'14 PhD Symposium  
systems and dataflow systems like Apache Flink.  ...  Additionally, we give an outlook on non-dataflow architectures for asynchronous distributed machine learning.  ... 
doi:10.1145/2602622.2602631 dblp:conf/sigmod/Schelter14 fatcat:kejcowbahrfl7mguzivnolwtoq
« Previous Showing results 1 — 15 out of 139 results