A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
PERFORMANCE COMPARISON OF HADOOP MAPREDUCE AND APACHE PIG
2017
International Journal of Advanced Research in Computer Science
There are various tools or frameworks available for this purpose; few of them are Apache Hadoop MapReduce, Apache Pig,Apache Hive, Apache Spark,Tez etc. ...
Hadoop MapReduce provides low level of abstraction whereas Pig provides high level of abstraction. ...
MapReduce takes lesser execution time as compared with Pig. ...
doi:10.26483/ijarcs.v8i9.5177
fatcat:lcc6x4u3yzai5fbdvt4tpdugea
Hadoop based Weblog Analysis: A Review
2016
International Journal of Software Engineering and Its Applications
An enterprise weblog analysis system based on Hadoop architecture with Hadoop Distributed File System (HDFS), Hadoop MapReduce Software Framework and Pig Latin Language aids the business decisionmaking ...
DEMs can be utilized across a variety of scientific as well as engineering programs such as creating a hydrologic pattern, terrain study, as well as framework plan. ...
Apache Hadoop is a suite of data and programming code that is used to develop software programs and applications. ...
doi:10.14257/ijseia.2016.10.6.02
fatcat:jyxr5kmkunhy7ck2zwqytodbq4
Analyzing performance of Apache Tez and MapReduce with hadoop multinode cluster on Amazon cloud
2016
Journal of Big Data
MapReduce programs are written in different programming and scripting languages.
Apache Tez Distributed processing is the base of hadoop. ...
MapReduce is a framework Fig. 1 An illustration of Hadoop ecosystem which helps in writing programs for processing of data in parallel across thousands of machines [6] . ...
Apache Tez and MapReduce required different no of containers during execution of same script. In previous section we already explained that MapReduce requires more no of containers then Apache Tez. ...
doi:10.1186/s40537-016-0051-6
fatcat:rwt5zwspyba2hhwextgb3hrgbq
A study and Performance Comparison of MapReduce and Apache Spark on Twitter Data on Hadoop Cluster
2018
International Journal of Information Technology and Computer Science
We also got an interesting result that, with the increase of the number of blocks on the Hadoop Distributed File System, also increases the run-time of both the MapReduce and Spark programs and even in ...
MapReduce is a high-performance distributed BigData programming framework which is highly preferred by most big data analysts and is out there for a long time with a very good documentation. ...
Apache Hadoop One solution to the above problem with traditional RDBMS is Apache Hadoop. ...
doi:10.5815/ijitcs.2018.07.07
fatcat:vcwlnn3ocfcr7jilbky57ngpty
CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce
2014
PLoS ONE
However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. ...
Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. ...
three steps, (2) manipulating a supported MapReduce program with the isolation method, and (3) integrating a MapReduce program with the program configuration file. ...
doi:10.1371/journal.pone.0098146
pmid:24897343
pmcid:PMC4045712
fatcat:mg255d6oxbcwfijsccwkqowgha
Distributed Computing Engines for Big Data Analytics
2019
International journal of recent technology and engineering
New programming paradigm like MapReduce with Hadoop distributed programming framework is widely used. ...
The experimental results revealed that Apache Spark and Apache Flink outperform Hadoop. This is evaluated with different benchmark big data workloads. ...
MapReduce is the distributed programming framework that is widely used [1]. However, it has its limitations when compared with the newly emerged frameworks like Apache Flink and Apache Spark. ...
doi:10.35940/ijrte.b3771.078219
fatcat:bbpnou3i2rbtdiqxufjj2kdzwe
Apache Hadoop Architecture, Applications, and Hadoop Distributed File System
2022
Semiconductor Science and Information Devices
This paper introduces Apache Hadoop architecture, components of Hadoop, their significance in managing vast volumes of data in a distributed system. ...
One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop. ...
Apache Hadoop was created with the intention of being used in computer clusters made up of commodity hardware. With a collection of commodity clusters, Apache Hadoop enables distributed computing. ...
doi:10.30564/ssid.v4i1.4619
fatcat:fcujwqcedzfdjgmerr3s6l6eq4
A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pig and Typical Mapreduce
2014
International Journal of Computer Science & Engineering Survey
Here comes the need of network traffic and weblog analysis for which Hadoop comes as a suggested solution. ...
ACKNOWLEDGEMENT We are greatly indebted to the college management and the faculty members for providing necessary hardware and other facilities along with timely guidance and suggestions for implementing ...
One of the key features of Hadoop MapReduce is its coding complexity and can be accomplished only by programmers with highly developed programming skills. ...
doi:10.5121/ijcses.2014.5101
fatcat:pknnwf36zzfjtksiselmhbrobe
Network Traffic Analysis:Hadoop Pig vs Typical MapReduce
[article]
2013
arXiv
pre-print
Thousands of transaction requests are handled and processed everyday by different websites associated with e-commerce, e-banking, e-shopping carts etc. ...
The network traffic and weblog analysis comes to play a crucial role in such situations where Hadoop can be suggested as an efficient solution for processing the Netflow data collected from switches as ...
ACKNOWLEDGEMENT We are greatly indebted to the college management and the faculty members for providing necessary facilities and hardware along with timely guidance and suggestions for implementing this ...
arXiv:1312.5469v1
fatcat:b6ineend65af7padicrdfarxfa
Map-Reduce Implementations: Survey and Performance Comparison
2015
International Journal of Computer Science & Information Technology (IJCSIT)
The comparison of various Map Reduce implementations is done with the most popular implementation Hadoop and other similar implementations using other platforms. ...
of the MapReduce framework. ...
and programming libraries for Hadoop. ...
doi:10.5121/ijcsit.2015.7410
fatcat:egex75g65rg3nbialubagvllpi
Hadoop Ecosystem and Its Analysis on Tweets
2015
Procedia - Social and Behavioral Sciences
MapReduce is a programming model which is used for processing and generating large data sets with a parallel, distributed algorithm on a cluster. ...
Hadoop is Java based programming framework for distributed storage and processing of large data sets on commodity hardware. It is developed by Apache Software Foundation as open source framework. ...
Hadoop basically has two main components; Hadoop Distributed File System (HDFS) for distributed storage and MapReduce for distributed processing. ...
doi:10.1016/j.sbspro.2015.06.429
fatcat:bfwlz5oyw5cl3poew7jny5jgge
An Implementation of Map Reduce on the Hadoop for Analyzing Big Data
2019
International journal of recent technology and engineering
Hadoop distributed architecture with MapReduce programming is analysis here. ...
We will compare with already existing MapReduce Technique with Hadoop to afford high performance and efficiency for large volume of dataset. ...
In 2004, Google introduce programming model called MapReduce paradigm, now it is supposed to say as Apache Hadoop. ...
doi:10.35940/ijrte.d1115.1284s219
fatcat:h7snclnpznborj7jofdrhjlpoy
BigData Analysis in Healthcare: Apache Hadoop , Apache spark and Apache Flink
2019
Frontiers in Health Informatics
The purpose of this study is to introduce and compare the most popular and most widely used platform for processing big data, Apache Hadoop MapReduce, and the two Apache Spark and Apache Flink platforms ...
Overall, the findings showed that the Apache Hadoop environment has simplicity, error detection, and scalability management based on clusters, but because its processing is based on batch processing, it ...
to 100 times faster than the Hadoop because of processing in memory, it also works better with MapReduce in executing the program on the disk. Supports various programming languages from Python to Scala ...
doi:10.30699/fhi.v8i1.180
fatcat:bmj24xxzbffnnhylvl7er6cytu
Big Data Analysis with Dataset Scaling in Yet Another Resource Negotiator (YARN)
2014
International Journal of Computer Applications
So we have used Apache Hadoop Distributed File System (HDFS) for storage and analysis. This paper shows experimental work done on the MapReduce Application on Health sector dataset. ...
The main problem is to check the behavior of the MapReduce applications by increasing the size of dataset. Our analysis lies in understanding the Apache MapReduce application performance. ...
To verify the result, we identify the map and reduce intensive programs. In this way, we anaylze the behavior of Apache MapReduce application with the dataset. ...
doi:10.5120/16009-5051
fatcat:zzd7fp5emraqzkbd632a27uoxu
Big Data Analysis using Apache Hadoop and Spark
2019
International journal of recent technology and engineering
CONCLUSION In this paper two programming model MapReduce and Apache Spark has been displayed for examining their execution with HadoopMapReduce and Apache Spark both can adapt to each kind of information ...
Apache Hadoop has HDFS which is utilized for putting away information in dispersed condition and MapReduce which is a Hadoop programming model.Apache Spark is an open-source motor for huge scale information ...
doi:10.35940/ijrte.a2128.078219
fatcat:47tmuscnrnbb5p5vphrahborbu
« Previous
Showing results 1 — 15 out of 5,324 results