7,835 Hits in 4.7 sec

Big Data, RDBMS and HADOOP - A Comparative Study

2016 International Journal of Science and Research (IJSR)  
Big Data and Hadoop are the recently trending words in the world of internet.  ...  RDBMS and its benefits had created a revolution in the field of data handling.  ...  According to what the inventors of Hadoop says, it could be a better substitute for Map Reduce and it has been confirmed throughout the comparative study. Paper ID: NOV162167  ... 
doi:10.21275/v5i3.nov162167 fatcat:3iygyl75sbecxkkgqiwk3dfbzq


Wenting He, Youliang Yan, Huimin Cui, Binbin Lu, Jiacheng Zhao, Shengmei Li, Gong Ruan, Jingling Xue, Xiaobing Feng, Wensen Yang
2015 Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15  
To support the approach, we present a heterogeneous MapReduce framework, Hadoop+, which enables CPUs and GPUs to process big data coordinately, and leverages the heterogeneity model to assist users in  ...  Despite the widespread adoption of heterogeneous clusters in modern data centers, modeling heterogeneity is still a big challenge, especially for large-scale MapReduce applications.  ...  Second, the "Data Transfer Engine" accumulates a chunk of data and transfers the data to the target device, according to the obtained computing resource via "Resource Request Agent".  ... 
doi:10.1145/2751205.2751236 dblp:conf/ics/HeCLZLRXFYY15 fatcat:vcdt3vdwlrczlbkxy3iqcdg4ey

Kepler + Hadoop

Jianwu Wang, Daniel Crawl, Ilkay Altintas
2009 Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science - WORKS '09  
Using the presented Hadoop components in Kepler, scientists can easily utilize MapReduce in their domain-specific problems and connect them with other tasks in a workflow through the Kepler graphical user  ...  MapReduce provides a parallel and scalable programming model for data-intensive business and scientific applications.  ...  [18] proposes and compares different strategies for compiling XML data processing pipelines to a set of MapReduce tasks (implemented within Hadoop) for efficient execution.  ... 
doi:10.1145/1645164.1645176 dblp:conf/sc/WangCA09 fatcat:coybwhjzdfeizngmypub4ru2gu


Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, Jörg Schad
2010 Proceedings of the VLDB Endowment  
However, this comes at a price: MapReduce processes tasks in a scan-oriented fashion.  ...  MapReduce is a computing paradigm that has gained a lot of attention in recent years from industry and research.  ...  However, all of the above approaches perform the join operation in the reduce phase and hence transfer a large amount of data trough the network -which is a potential bottleneck.  ... 
doi:10.14778/1920841.1920908 fatcat:gldzhlwrmfen3kaqsdmwhlon7e

Data Partitioning for Minimizing Transferred Data in MapReduce [chapter]

Miguel Liroz-Gistau, Reza Akbarinia, Divyakant Agrawal, Esther Pacitti, Patrick Valduriez
2013 Lecture Notes in Computer Science  
The results show high reduction in data transfer during the shuffle phase compared to Native Hadoop.  ...  We evaluated our approach through experimentation in a Hadoop deployment on top of Grid5000 using standard benchmarks.  ...  Experiments presented in this paper were carried out using the Grid'5000 experimental testbed, being developed under the INRIA AL-ADDIN development action with support from CNRS, RENATER and several universities  ... 
doi:10.1007/978-3-642-40053-7_1 fatcat:ndo3rnq6jvawpkgwmvhw5rbghm

Bi-Hadoop: Extending Hadoop to Improve Support for Binary-Input Applications

Xiao Yu, Bo Hong
2013 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing  
This often leads to excessive data transfers and significant degradations in application performance.  ...  Bi-Hadoop integrates an easy-to-use user interface, a binary-input aware task scheduler, and a caching subsystem.  ...  Without knowing which data blocks may be accessed by a task, the Hadoop scheduler cannot perform locality-aware scheduling, which will result in excessive data transfer overheads. B.  ... 
doi:10.1109/ccgrid.2013.56 dblp:conf/ccgrid/YuH13 fatcat:ub6bpnmvfjhqliptbxs34r3ahm

SMARTH: Enabling Multi-pipeline Data Transfer in HDFS

Hong Zhang, Liqiang Wang, Hai Huang
2014 2014 43rd International Conference on Parallel Processing  
Specifically, SMARTH is able to improve the throughput of data transfer by 27-245% in a heterogeneous virtual cluster on Amazon EC2.  ...  In this paper, we introduce an improved HDFS design called SMARTH. It utilizes asynchronous multi-pipeline data transfers instead of a single pipeline stop-and-wait mechanism.  ...  ACKNOWLEDGEMENT This work was supported in part by NSF-CAREER-1054834.  ... 
doi:10.1109/icpp.2014.12 dblp:conf/icpp/ZhangWH14 fatcat:vx2g4cvncbc7tarolsd7vy4iv4

Adaptive Preshuffling in Hadoop Clusters

Jiong Xie, Yun Tian, Shu Yin, Ji Zhang, Xiaojun Ruan, Xiao Qin
2013 Procedia Computer Science  
In this paper, We proposed a new preshuffling strategy in Hadoop to reduce high network loads imposed by shuffle-intensive applications.  ...  We implemented the push model and a pipeline along with the preshuffling scheme in the Hadoop system.  ...  In an early stage of this study, we observe that a Hadoop application's execution time is greatly affected by the amount of data transferred during the shuffle phase.  ... 
doi:10.1016/j.procs.2013.05.422 fatcat:q4lvmnrxdzbp3pjnhoizostkne

Hadoop Image Processing Framework

Sridhar Vemula, Christopher Crick
2015 2015 IEEE International Congress on Big Data  
The emergence of processing frameworks such as the Hadoop MapReduce[1] platform addresses the problem of providing a system for computationally intensive data processing and distributed storage.  ...  To address this we have developed the Hadoop Image Processing Framework, which provides a Hadoop-based library to support large-scale image processing.  ...  Studies using Hadoop have been performed, dealing with text data files [3] , analyzing large volumes of DNA sequence data [4] , converting the data of a large number of still images to PDF format, and  ... 
doi:10.1109/bigdatacongress.2015.80 dblp:conf/bigdata/VemulaC15 fatcat:hwww2m2vynamxpvznbuorn6gja

XRootD popularity on hadoop clusters

Marco Meoni, Tommaso Boccali, Nicolò Magini, Luca Menichetti, Domenico Giordano
2017 Journal of Physics, Conference Series  
Figure 8 shows the performance speed-up in Hadoop compared to Oracle. Each data point is averaged on 3 tests.  ...  They are currently spread out in several Oracle-based data sources, which we need to transfer, aggregate, map and reduce for data analytics leveraging the Hadoop cluster.  ... 
doi:10.1088/1742-6596/898/7/072027 fatcat:4sjeq7a4m5bw7kjk5cxstjmpna

Hadoop on Named Data Networking

Mathias Gibbens, Chris Gniady, Lei Ye, Beichuan Zhang
2017 Proceedings of the ACM on Measurement and Analysis of Computing Systems  
This paper presents and discusses our experience in modifying Apache Hadoop, a popular MapReduce framework, to operate on an NDN network.  ...  Through detailed evaluation, we show a reduction of 16% for overall data transmission between Hadoop nodes while writing data with default replication settings.  ...  The increase in broadcast traffic for the idle node (H12) does not introduce a significant amount of load on the network as compared to the overall data transferred.  ... 
doi:10.1145/3084439 dblp:journals/pomacs/GibbensGYZ17 fatcat:plf4ajwzgnezhmpkr66cudtthe

YARNsim: Simulating Hadoop YARN

Ning Liu, Xi Yang, Xian-He Sun, Johnathan Jenkins, Robert Ross
2015 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing  
The experiments prove that YARNsim can provide what-if analysis for system designers in a timely manner and at minimal cost compared with testing and evaluating on a real system.  ...  The next generation of Hadoop, Apache Hadoop YARN, is designed to address these issues. In this paper, we propose YARNsim, a simulation system for Hadoop YARN.  ...  We plan to develop a capacity scheduler, fair scheduler, and other advanced scheduling algorithms in the near future to support the simulation of complex job and task execution.  ... 
doi:10.1109/ccgrid.2015.61 dblp:conf/ccgrid/LiuYSJR15 fatcat:w4wv2ldpgzcz3akzl4uq7msdy4

Managing data transfers in computer clusters with orchestra

Mosharaf Chowdhury, Matei Zaharia, Justin Ma, Michael I. Jordan, Ion Stoica
2011 Computer communication review  
Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5× compared to the status quo in Hadoop.  ...  Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages.  ...  This research was supported in part by gifts from AMPLab founding sponsors Google and SAP, AMPLab sponsors Amazon Web Services, Cloudera, Huawei, IBM, Intel, Microsoft, NEC, NetApp, and VMWare, and by  ... 
doi:10.1145/2043164.2018448 fatcat:5otdyc3x6bax7oxsbybkffk7wm

Straggler handling approaches in mapreduce framework: a comparative study

Anwar H. Katrawi, Rosni Abdullah, Mohammed Anbar, Ibrahim AlShourbaji, Ammar Kamal Abasi
2021 International Journal of Power Electronics and Drive Systems (IJPEDS)  
However, stragglers are a major bottleneck in big data processing, and hence the early detection and accurate identification of stragglers can have important impacts on the performance of big data processing  ...  The proliferation of information technology produces a huge amount of data called big data that cannot be processed by traditional database systems.  ...  - Int J Elec & Comp Eng ISSN: 2088-8708  Straggler handling approaches in MapReduce framework: a comparative study (Anwar H. Katrawi)  ... 
doi:10.11591/ijece.v11i1.pp375-382 fatcat:odidhj5otnepdd73cvqdq4urom

A Comparative Study of Data Processing Approaches for Text Processing Workflows

Ting Chen, Kenjiro Taura
2012 2012 SC Companion: High Performance Computing, Networking Storage and Analysis  
This paper studies three realworld text processing workflows and develops them on top of several different large data processing approaches including an open source MapReduce implementation -Hadoop, a  ...  Workflows are widely used in data-intensive applications since it facilities the composition of individual executables or scripts, providing an easy-to-use parallelization to domain experts.  ...  In the present paper we extend the previous study by comparing four approaches, files, Hadoop, Hive, and ParaLite, and discuss their strength/weaknesses both in terms of programmability and performance  ... 
doi:10.1109/sc.companion.2012.152 dblp:conf/sc/ChenT12 fatcat:y4vmybltyfgitpwgydiletpdke
« Previous Showing results 1 — 15 out of 7,835 results