790 Hits in 4.3 sec

Boosting MapReduce with Network-Aware Task Assignment [chapter]

Fei Xu, Fangming Liu, Dekang Zhu, Hai Jin
2014 Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering  
We further design a network-aware task assignment strategy to shorten the completion time of MapReduce jobs in shared clusters.  ...  task assignment strategies, yet with an acceptable computational overhead.  ...  Effectiveness of Network-Aware Task Assignment To illustrate the effectiveness of our network-aware task assignment strategy, we compared it with three widely-used strategies, i.e., random assignment,  ... 
doi:10.1007/978-3-319-05506-0_8 fatcat:ph7646za2zeq5j4reho2odov24

Software Design and Implementation for MapReduce across Distributed Data Centers

Lizhe Wang, Jie Tao, Yan Ma, Samee U. Khan, Joanna Kołodziej, Dan Chen
2013 Applied Mathematics & Information Sciences  
G-Hadoop uses the Gfarm file system as an underlying file system and executes MapReduce tasks across distributed clusters.  ...  The MapReduce paradigm has emerged as a highly successful programming model for large-scale data-intensive computing applications.  ...  In traditional Hadoop clusters with HDFS, map tasks are preferably assigned to nodes where the required input data is locally present.  ... 
doi:10.12785/amis/071l13 fatcat:7ip6kgxvc5dgzcyicdhxpyaxem

Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization

Zhenhua Guo, Geoffrey Fox
2012 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)  
We investigate network heterogeneity aware scheduling of both map and reduce tasks.  ...  In MapReduce, map and reduce tasks are assigned to map and reduce slots hosted by worker nodes. Usually the numbers of map and reduce slots are carefully chosen to gain optimal resource usage.  ...  However, long tasks may get starved and priority boosting can be used to avoid starvation. The second heuristic is to choose the task with maximum expected completion time among C j , shown in (8) .  ... 
doi:10.1109/ccgrid.2012.12 dblp:conf/ccgrid/GuoF12 fatcat:7cmx4wtff5g2dcsjpzdjqiqfqy

A survey on bandwidth-aware geo-distributed frameworks for big-data analytics

Mohammed Bergui, Said Najah, Nikola S. Nikolov
2021 Journal of Big Data  
In this article, we discuss challenges and survey the latest geo-distributed big-data analytics frameworks and schedulers (based on MapReduce and Spark) with WAN-bandwidth awareness.  ...  While cluster computing applications, such as MapReduce and Spark, have been widely deployed in data centres to support commercial applications and scientific research, they are not designed for running  ...  MapReduce jobs are submitted to a resource manager that supervises and assigns the execution of tasks to node managers.  ... 
doi:10.1186/s40537-021-00427-9 fatcat:u2jx7x6hkfc47kn2iqpkcquhi4


L. A. Steffenel, O. Flauzac, A. S. Charao, P. P. Barcelos, B. Stein, G. Cassales, S. Nesmachnow, J. Rey, M. Cogorno, M. Kirsch-Pinheiro, C. Souveyet
2014 Journal of Computer Science  
context-awareness and fault-tolerance features; and providing an alternative pervasive grid implementation, fully adapted to dynamic environments.  ...  fault-tolerance features to provide efficient and reliable MapReduce services on pervasive grids.  ...  Container assignment with context-awareness configuration simulating heterogeneous environment Table 1.  ... 
doi:10.3844/jcssp.2014.2194.2210 fatcat:7nd7azrvifc6xi6wqrdwp274cu

FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

Yaling Xun, Jifu Zhang, Xiao Qin
2016 IEEE Transactions on Systems, Man & Cybernetics. Systems  
JobTracker is responsible for assigning and scheduling tasks; each TaskTracker handles Map or Reduce tasks assigned by JobTracker.  ...  The overarching goal of FiDoop-DP is to boost the performance. A similarity metric to facilitate data-aware partitioning.  ... 
doi:10.1109/tsmc.2015.2437327 fatcat:sgodseagojalzpvby7svpxgt74

Locality-Aware Reduce Task Scheduling for MapReduce

Mohammad Hammoud, Majd F. Sakr
2011 2011 IEEE Third International Conference on Cloud Computing Technology and Science  
LARTS attempts to collocate reduce tasks with the maximum required data computed after recognizing input data network locations and sizes.  ...  This paper describes Locality-Aware Reduce Task Scheduler (LARTS), a practical strategy for improving MapReduce performance.  ...  Thus, similar to map task scheduling, we suggest making MapReduce aware of partitions' network locations in order to apply locality to reduce task scheduling.  ... 
doi:10.1109/cloudcom.2011.87 dblp:conf/cloudcom/HammoudS11 fatcat:iyrlovqosnfoldayb4lq7qoh4q

Investigation of data locality and fairness in MapReduce

Zhenhua Guo, Geoffrey Fox, Mo Zhou
2012 Proceedings of third international workshop on MapReduce and its Applications Date - MapReduce '12  
Its data locality aware scheduling strategy exploits the locality of data accessing to minimize data movement and thus reduce network traffic.  ...  In data-intensive computing, MapReduce is an important tool that allows users to process large amounts of data easily.  ...  For typical MapReduce clusters where most jobs are small, scheduling delay of several seconds is sufficient to generate performance boost.  ... 
doi:10.1145/2287016.2287022 fatcat:to2ism2yxvfmhfuko52htnfzqe

A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks [article]

Sanaa Hamid Mohamed, Taisir E.H. El-Gorashi, Jaafar M.H. Elmirghani
2019 arXiv   pre-print
This survey article reviews the challenges associated with deploying and optimizing big data applications and machine learning algorithms in cloud data centers and networks.  ...  MapReduce and Hadoop thus introduce innovative, efficient, and accelerated intensive computations and analytics.  ...  8 servers (using Pica8 3297), trace-driven simulations [277]* Network-aware MapReduce tasks placement to reduce transmission costs in DCNs Hadoop 1.2.1 Probabilistic tasks scheduling algorithm  ... 
arXiv:1910.00731v1 fatcat:kvi3br4iwzg3bi7fifpgyly7m4

MapReduce across Distributed Clusters for Data-intensive Applications

Lizhe Wang, Jie Tao, Holger Marten, Achim Streit, Samee U. Khan, Joanna Kolodziej, Dan Chen
2012 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum  
G-Hadoop uses the Gfarm file system as an underlying file system and executes MapReduce tasks across distributed clusters.  ...  The MapReduce paradigm has emerged as a highly successful programming model for large-scale data-intensive computing applications.  ...  In traditional Hadoop clusters with HDFS, map tasks are preferably assigned to nodes where the required input data is locally present.  ... 
doi:10.1109/ipdpsw.2012.249 dblp:conf/ipps/WangTMSKKC12 fatcat:xnrdnqzpubgm5lm7jwhwejuo5m

A cross-job framework for MapReduce scheduling

Xuejie Xiao, Jian Tang, Zhenhua Chen, Jielong Xu, Chonggang Wang
2014 2014 IEEE International Conference on Big Data (Big Data)  
. (2) It can support all the existing MapReduce applications with no changes to their source code. (3) It is a general framework, which can work with different scheduling algorithms.  ...  Our experimental results show that the cross-job Hadoop can significantly reduce both the total processing time of a job sequence and the size of data transferred over the network.  ...  HaLoop not only extends MapReduce with programming support for iterative applications, but also dramatically improves their efficiency by making the task scheduler loop-aware and by adding various caching  ... 
doi:10.1109/bigdata.2014.7004222 dblp:conf/bigdataconf/XiaoTCXW14 fatcat:pd34gbqp2nccbkok3424ck7hxm

H-WorD: Supporting Job Scheduling in Hadoop with Workload-Driven Data Redistribution [chapter]

Petar Jovanovic, Oscar Romero, Toon Calders, Alberto Abelló
2016 Lecture Notes in Computer Science  
Today's distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic.  ...  We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the benefits of automatic data redistribution.  ...  Using such information, we can proactively perform data redistribution in advance for boosting tasks' data locality and parallelism of the MapReduce jobs.  ... 
doi:10.1007/978-3-319-44039-2_21 fatcat:fs73k4otknhnhbj25wlvfhttwi

Support Vector Regression based Mapreduce Throttled Load Balancer for Data Centers

In order to improve the load balancing with maximum throughput and minimum makespan, Support Vector Regression based MapReduce Throttled Load Balancing (SVR-MTLB) technique is introduced.  ...  Therefore, the incoming tasks are allocated with better utilization of resources to minimize the workload across the server in the cloud.  ...  The task assigner performs priority task classification of incoming tasks using gradient Boosting ensemble classifier.  ... 
doi:10.35940/ijitee.a6102.119119 fatcat:zsqjyqn2yff7phntko5k6yipau

Toward scalable internet traffic measurement and analysis with Hadoop

Yeonhee Lee, Youngseok Lee
2012 Computer communication review  
We also explain the performance issues related with traffic analysis MapReduce jobs.  ...  From experiments with a 200-node testbed, we achieved 14 Gbps throughput for 5 TB files with IP and HTTP-layer analysis MapReduce jobs.  ...  such as CPU, memory, hard disk, and network, and the other with MapReduce algorithm optimization.  ... 
doi:10.1145/2427036.2427038 fatcat:43elfcm5kbdbbojjvj7ljevwmm

Hadoop MapReduce for Mobile Clouds

Johnu George, Chien-An Chen, Radu Stoleru, Geoffrey Xie
2016 IEEE Transactions on Cloud Computing  
., caused by unexpected device failures or topology changes in a dynamic network).  ...  We have developed the Hadoop MapReduce framework over MDFS and have studied its performance by varying input workloads in a real heterogeneous mobile cluster.  ...  Energy-aware task scheduling Hadoop Mapreduce framework relies on data locality for boosting overall system throughput. Computation is moved closer to the nodes where the data resides.  ... 
doi:10.1109/tcc.2016.2603474 fatcat:2kdyoj2xefeztc3yt2akk5bqo4
« Previous Showing results 1 — 15 out of 790 results