5,324 Hits in 3.0 sec


Chandra Kala Kuruba
2017 International Journal of Advanced Research in Computer Science  
There are various tools or frameworks available for this purpose; few of them are Apache Hadoop MapReduce, Apache Pig,Apache Hive, Apache Spark,Tez etc.  ...  Hadoop MapReduce provides low level of abstraction whereas Pig provides high level of abstraction.  ...  MapReduce takes lesser execution time as compared with Pig.  ... 
doi:10.26483/ijarcs.v8i9.5177 fatcat:lcc6x4u3yzai5fbdvt4tpdugea

Hadoop based Weblog Analysis: A Review

Pooja D. Savant, Debnath Bhattacharyya, Tai-hoon Kim
2016 International Journal of Software Engineering and Its Applications  
An enterprise weblog analysis system based on Hadoop architecture with Hadoop Distributed File System (HDFS), Hadoop MapReduce Software Framework and Pig Latin Language aids the business decisionmaking  ...  DEMs can be utilized across a variety of scientific as well as engineering programs such as creating a hydrologic pattern, terrain study, as well as framework plan.  ...  Apache Hadoop is a suite of data and programming code that is used to develop software programs and applications.  ... 
doi:10.14257/ijseia.2016.10.6.02 fatcat:jyxr5kmkunhy7ck2zwqytodbq4

Analyzing performance of Apache Tez and MapReduce with hadoop multinode cluster on Amazon cloud

Rupinder Singh, Puneet Jai Kaur
2016 Journal of Big Data  
MapReduce programs are written in different programming and scripting languages. Apache Tez Distributed processing is the base of hadoop.  ...  MapReduce is a framework Fig. 1 An illustration of Hadoop ecosystem which helps in writing programs for processing of data in parallel across thousands of machines [6] .  ...  Apache Tez and MapReduce required different no of containers during execution of same script. In previous section we already explained that MapReduce requires more no of containers then Apache Tez.  ... 
doi:10.1186/s40537-016-0051-6 fatcat:rwt5zwspyba2hhwextgb3hrgbq

A study and Performance Comparison of MapReduce and Apache Spark on Twitter Data on Hadoop Cluster

Md. Nowraj Farhan, Md. Ahsan Habib, Md. Arshad Ali
2018 International Journal of Information Technology and Computer Science  
We also got an interesting result that, with the increase of the number of blocks on the Hadoop Distributed File System, also increases the run-time of both the MapReduce and Spark programs and even in  ...  MapReduce is a high-performance distributed BigData programming framework which is highly preferred by most big data analysts and is out there for a long time with a very good documentation.  ...  Apache Hadoop One solution to the above problem with traditional RDBMS is Apache Hadoop.  ... 
doi:10.5815/ijitcs.2018.07.07 fatcat:vcwlnn3ocfcr7jilbky57ngpty

CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce

Wei-Chun Chung, Chien-Chih Chen, Jan-Ming Ho, Chung-Yen Lin, Wen-Lian Hsu, Yu-Chun Wang, D. T. Lee, Feipei Lai, Chih-Wei Huang, Yu-Jung Chang, Christophe Antoniewski
2014 PLoS ONE  
However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce.  ...  Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms.  ...  three steps, (2) manipulating a supported MapReduce program with the isolation method, and (3) integrating a MapReduce program with the program configuration file.  ... 
doi:10.1371/journal.pone.0098146 pmid:24897343 pmcid:PMC4045712 fatcat:mg255d6oxbcwfijsccwkqowgha

Distributed Computing Engines for Big Data Analytics

2019 International journal of recent technology and engineering  
New programming paradigm like MapReduce with Hadoop distributed programming framework is widely used.  ...  The experimental results revealed that Apache Spark and Apache Flink outperform Hadoop. This is evaluated with different benchmark big data workloads.  ...  MapReduce is the distributed programming framework that is widely used [1]. However, it has its limitations when compared with the newly emerged frameworks like Apache Flink and Apache Spark.  ... 
doi:10.35940/ijrte.b3771.078219 fatcat:bbpnou3i2rbtdiqxufjj2kdzwe

Apache Hadoop Architecture, Applications, and Hadoop Distributed File System

Pratit Raj Giri, Gajendra Sharma
2022 Semiconductor Science and Information Devices  
This paper introduces Apache Hadoop architecture, components of Hadoop, their significance in managing vast volumes of data in a distributed system.  ...  One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop.  ...  Apache Hadoop was created with the intention of being used in computer clusters made up of commodity hardware. With a collection of commodity clusters, Apache Hadoop enables distributed computing.  ... 
doi:10.30564/ssid.v4i1.4619 fatcat:fcujwqcedzfdjgmerr3s6l6eq4

A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pig and Typical Mapreduce

Anjali P P, Binu A
2014 International Journal of Computer Science & Engineering Survey  
Here comes the need of network traffic and weblog analysis for which Hadoop comes as a suggested solution.  ...  ACKNOWLEDGEMENT We are greatly indebted to the college management and the faculty members for providing necessary hardware and other facilities along with timely guidance and suggestions for implementing  ...  One of the key features of Hadoop MapReduce is its coding complexity and can be accomplished only by programmers with highly developed programming skills.  ... 
doi:10.5121/ijcses.2014.5101 fatcat:pknnwf36zzfjtksiselmhbrobe

Network Traffic Analysis:Hadoop Pig vs Typical MapReduce [article]

PP Anjali, A Binu
2013 arXiv   pre-print
Thousands of transaction requests are handled and processed everyday by different websites associated with e-commerce, e-banking, e-shopping carts etc.  ...  The network traffic and weblog analysis comes to play a crucial role in such situations where Hadoop can be suggested as an efficient solution for processing the Netflow data collected from switches as  ...  ACKNOWLEDGEMENT We are greatly indebted to the college management and the faculty members for providing necessary facilities and hardware along with timely guidance and suggestions for implementing this  ... 
arXiv:1312.5469v1 fatcat:b6ineend65af7padicrdfarxfa

Map-Reduce Implementations: Survey and Performance Comparison

Zeba Khanam, Shafali Agarwal
2015 International Journal of Computer Science & Information Technology (IJCSIT)  
The comparison of various Map Reduce implementations is done with the most popular implementation Hadoop and other similar implementations using other platforms.  ...  of the MapReduce framework.  ...  and programming libraries for Hadoop.  ... 
doi:10.5121/ijcsit.2015.7410 fatcat:egex75g65rg3nbialubagvllpi

Hadoop Ecosystem and Its Analysis on Tweets

Can Uzunkaya, Tolga Ensari, Yusuf Kavurucu
2015 Procedia - Social and Behavioral Sciences  
MapReduce is a programming model which is used for processing and generating large data sets with a parallel, distributed algorithm on a cluster.  ...  Hadoop is Java based programming framework for distributed storage and processing of large data sets on commodity hardware. It is developed by Apache Software Foundation as open source framework.  ...  Hadoop basically has two main components; Hadoop Distributed File System (HDFS) for distributed storage and MapReduce for distributed processing.  ... 
doi:10.1016/j.sbspro.2015.06.429 fatcat:bfwlz5oyw5cl3poew7jny5jgge

An Implementation of Map Reduce on the Hadoop for Analyzing Big Data

2019 International journal of recent technology and engineering  
Hadoop distributed architecture with MapReduce programming is analysis here.  ...  We will compare with already existing MapReduce Technique with Hadoop to afford high performance and efficiency for large volume of dataset.  ...  In 2004, Google introduce programming model called MapReduce paradigm, now it is supposed to say as Apache Hadoop.  ... 
doi:10.35940/ijrte.d1115.1284s219 fatcat:h7snclnpznborj7jofdrhjlpoy

BigData Analysis in Healthcare: Apache Hadoop , Apache spark and Apache Flink

Elham Nazari, Mohammad Hasan Shahriari, Hamed Tabesh
2019 Frontiers in Health Informatics  
The purpose of this study is to introduce and compare the most popular and most widely used platform for processing big data, Apache Hadoop MapReduce, and the two Apache Spark and Apache Flink platforms  ...  Overall, the findings showed that the Apache Hadoop environment has simplicity, error detection, and scalability management based on clusters, but because its processing is based on batch processing, it  ...  to 100 times faster than the Hadoop because of processing in memory, it also works better with MapReduce in executing the program on the disk.  Supports various programming languages from Python to Scala  ... 
doi:10.30699/fhi.v8i1.180 fatcat:bmj24xxzbffnnhylvl7er6cytu

Big Data Analysis with Dataset Scaling in Yet Another Resource Negotiator (YARN)

Gurpreet SinghBedi, Ashima Singh
2014 International Journal of Computer Applications  
So we have used Apache Hadoop Distributed File System (HDFS) for storage and analysis. This paper shows experimental work done on the MapReduce Application on Health sector dataset.  ...  The main problem is to check the behavior of the MapReduce applications by increasing the size of dataset. Our analysis lies in understanding the Apache MapReduce application performance.  ...  To verify the result, we identify the map and reduce intensive programs. In this way, we anaylze the behavior of Apache MapReduce application with the dataset.  ... 
doi:10.5120/16009-5051 fatcat:zzd7fp5emraqzkbd632a27uoxu

Big Data Analysis using Apache Hadoop and Spark

2019 International journal of recent technology and engineering  
CONCLUSION In this paper two programming model MapReduce and Apache Spark has been displayed for examining their execution with HadoopMapReduce and Apache Spark both can adapt to each kind of information  ...  Apache Hadoop has HDFS which is utilized for putting away information in dispersed condition and MapReduce which is a Hadoop programming model.Apache Spark is an open-source motor for huge scale information  ... 
doi:10.35940/ijrte.a2128.078219 fatcat:47tmuscnrnbb5p5vphrahborbu
« Previous Showing results 1 — 15 out of 5,324 results