1,745 Hits in 2.8 sec

H-WorD: Supporting Job Scheduling in Hadoop with Workload-Driven Data Redistribution [chapter]

Petar Jovanovic, Oscar Romero, Toon Calders, Alberto Abelló
2016 Lecture Notes in Computer Science  
We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the benefits of automatic data redistribution.  ...  In this paper, we address the challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications.  ...  Workload-driven Redistribution of Data In this section, we tackle the previously discussed challenges, and present our algorithm for workload-driven redistribution of data, namely, H-WorD.  ... 
doi:10.1007/978-3-319-44039-2_21 fatcat:fs73k4otknhnhbj25wlvfhttwi

Accelerating MapReduce Analytics Using CometCloud

Moustafa AbdelBaky, Hyunjoo Kim, Ivan Rodero, Manish Parashar
2012 2012 IEEE Fifth International Conference on Cloud Computing  
Heterogeneity is also unavoidable in scientific applications that process a varying number of datasets of different sizes. In these cases, the performance of MapReduce-Hadoop can be a concern.  ...  These resources can be selected from a hybrid infrastructure such as local clusters, data centers, and public clouds.  ...  The research presented in this paper is supported in part by National Science Foundation via grants numbers IIP 0758566, CCF-0833039, DMS-0835436, CNS 0426354, IIS 0430826, and CNS 0723594, by Department  ... 
doi:10.1109/cloud.2012.150 dblp:conf/IEEEcloud/AbdelBakyKRP12 fatcat:oxqjtykasbffxlhgjgniojb6me

Resource Management for Dynamic MapReduce Clusters in Multicluster Systems

Bogdan Ghit, Nezih Yigitbasi, Dick Epema
2012 2012 SC Companion: High Performance Computing, Networking Storage and Analysis  
In this paper, we design and implement a resource management system to facilitate the on-demand isolated deployment of MapReduce clusters in multicluster systems.  ...  To efficiently manage the underlying physical resources, we propose three provisioning policies for dynamically resizing MapReduce clusters, and we evaluate the performance of our system through experiments  ...  and transient nodes in an MR cluster, and we run benchmarks to compare the resizing policies.  ... 
doi:10.1109/sc.companion.2012.151 dblp:conf/sc/GhitYE12 fatcat:3qm6l6cmnzdf5pg3nihcz5pyci

Hybrid Job-Driven Scheduling for Virtual MapReduce Clusters

Ming-Chang Lee, Jia-Chun Lin, Ramin Yahyapour
2016 IEEE Transactions on Parallel and Distributed Systems  
JoSS classifies MapReduce jobs based on job scale and job type and designs an appropriate scheduling policy to schedule each class of jobs.  ...  To provide an appropriate scheduling scheme for this type of computing environment, we propose in this paper a hybrid job-driven scheduling scheme (JoSS for short) from a tenant's perspective.  ...  In Hadoop, a MapReduce cluster consists of two masters called JobTracker [2] and Name-Node [2] and a set of slaves.  ... 
doi:10.1109/tpds.2015.2463817 fatcat:o4f44qz5fzadrjkbl23ghpo5li

DyScale: A MapReduce Job Scheduler for Heterogeneous Multicore Processors

Feng Yan, Ludmila Cherkasova, Zhuoyao Zhang, Evgenia Smirni
2017 IEEE Transactions on Cloud Computing  
MapReduce jobs, while offering improved throughput (up to 40 percent) for large, batch jobs.  ...  The functionality of modern multi-core processors is often driven by a given power budget that requires designers to evaluate different decision trade-offs, e.g., to choose between many slow, power-efficient  ...  We select 13 diverse MapReduce applications [2] to run experiments in our Hadoop cluster. The high level description of these applications is given in Table 2 .  ... 
doi:10.1109/tcc.2015.2415772 fatcat:ls62o7yt2vbulnpw5zfwzj5lui

MapReduce Scheduler: A 360-degree view [article]

Rajdeep Das, Rohit Pratap Singh, Ripon Patgiri
2017 arXiv   pre-print
In this paper, we provide in-depth insight on the MapReduce scheduling algorithm.  ...  Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster.  ...  A novel job scheduling technique is proposed, Throughput driven [29] task scheduler for obtaining high system throughput in the jobintensive MapReduce environment.  ... 
arXiv:1704.02632v1 fatcat:44kf7vncjjha7paeilhyuupj5a

Performance-driven task co-scheduling for MapReduce environments

Jorda Polo, David Carrera, Yolanda Becerra, Malgorzata Steinder, Ian Whalley
2010 2010 IEEE Network Operations and Management Symposium - NOMS 2010  
In this paper, we address this problem by introducing a new task scheduler for a MapReduce framework that allows performance-driven management of MapReduce tasks.  ...  MapReduce is a data-driven programming model proposed by Google in 2004 which is especially well suited for distributed data analytics applications.  ...  Task selection and slave node assignment govern a job's opportunity to make progress, and thus influences job performance.  ... 
doi:10.1109/noms.2010.5488494 dblp:conf/noms/PoloCBSW10 fatcat:2r4ljikgonb23nv4vurgqvll7e

Heterogeneous cores for MapReduce processing: Opportunity or challenge?

Feng Yan, Ludmila Cherkasova, Zhuoyao Zhang, Evgenia Smirni
2014 2014 IEEE Network Operations and Management Symposium (NOMS)  
., to choose between many slow cores, fewer faster cores, or to select a combination of them.  ...  In this work, we design a new Hadoop scheduler, called DyScale, that exploits capabilities offered by heterogeneous cores for achieving a variety of performance objectives.  ...  Job scheduling in Hadoop is performed by a master node called the JobTracker, which manages a number of worker nodes in the cluster.  ... 
doi:10.1109/noms.2014.6838339 dblp:conf/noms/YanCZS14 fatcat:jliihgk4ozbl3il4slco4r5ihe

Minimizing Remote Accesses in MapReduce Clusters

Prateek Tandon, Michael J. Cafarella, Thomas F. Wenisch
2013 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum  
MapReduce, in particular Hadoop, is a popular framework for the distributed processing of large datasets on clusters of relatively inexpensive servers.  ...  We find that, in an unconstrained cluster where a job's map tasks may be scheduled dynamically on any node over time, Hadoop's default random data placement is effective in avoiding remote accesses.  ...  In contrast to the authors' approach, we focus on how intelligent data placement can be used to maximize MapReduce efficiency in scenarios where node allocations are restricted.  ... 
doi:10.1109/ipdpsw.2013.195 dblp:conf/ipps/TandonCW13 fatcat:3c3yo5aqevcpbpuagvgp3udnqi

Governing energy consumption in Hadoop through CPU frequency scaling: An analysis

Shadi Ibrahim, Tien-Dat Phan, Alexandra Carpen-Amarie, Houssem-Eddine Chihoub, Diana Moise, Gabriel Antoniu
2016 Future generations computer systems  
In this paper, we focus on MapReduce processing and we investigate the impact of dynamically scaling the frequency of compute nodes on the performance and energy consumption of a Hadoop cluster.  ...  Most large-scale data computations in the cloud heavily rely on the MapReduce paradigm and on its Hadoop implementation.  ...  tuning policy -when relying on static MapReduce phase detection, similar to [ Dynamic tuning policies We study two coarse-grained frequency tuning policies: • Dynamic CPU frequency.  ... 
doi:10.1016/j.future.2015.01.005 fatcat:tr7ld7ykkjcypgpt4eyr4accmm

Factors affecting cloud data-center efficiency: a scheduling algorithm-based analysis

Shehloo Arif Ahmad, Butt Muheet Ahmed, Zaman Majid
2021 International Journal of Advanced Technology and Engineering Exploration  
Google developed the MapReduce programming paradigm to counter this problem, which served as the foundation for Apache's open-source Hadoop project.  ...  In addition, many of these kinds of platforms must-have features like parallel processing, fault tolerance, data dissemination, scalability, availability, and load balancing.  ...  [58] proposed a load-driven Dynamic Slot Controller (DSC) technique that modifies the map and reduces task slots in response to the slave nodes' workload.  ... 
doi:10.19101/ijatee.2021.874313 fatcat:2bwdcxpac5ccddxdcgpshrrbye

Rough Sets Base Associative Classification Rules Extraction from Big Data

But in our literature review, we find that for the MapReduce system there is an absolute lack of rough setbased technique.  ...  Many conventional data analytics techniques have been extended to the MapReduce framework to process Big Data.  ...  In conflicting data, RST-driven rule generation system may generate minor and non-redundant rule sets.  ... 
doi:10.35940/ijitee.a9140.119119 fatcat:p6sn2hndyrfhbdsnwjauybdpba

Towards a data-centric view of cloud security

Wenchao Zhou, Micah Sherr, William R. Marczak, Zhuoyao Zhang, Tao Tao, Boon Thau Loo, Insup Lee
2010 Proceedings of the second international workshop on Cloud data management - CloudDB '10  
In particular, we explore the security properties of secure data sharing between applications hosted in the cloud.  ...  In this paper, we take an alternative perspective and propose a data-centric view of cloud security.  ...  In the Filtering Phase, the master node distributes 6,400 randomly selected webpages from the Stanford Web-Base project [2] to map workers.  ... 
doi:10.1145/1871929.1871934 dblp:conf/cikm/ZhouSMZTLL10 fatcat:jajge32swnhkhks2fykd5hrb7i

A contention aware hybrid evaluator for schedulers of big data applications in computer clusters

Shouvik Bardhan, Daniel A. Menasce
2014 2014 IEEE International Conference on Big Data (Big Data)  
This method is then implemented in Mumak, a popular Hadoop job-trace simulator making it contention-aware.  ...  This paper presents a Trace Driven Analytic Model (TDAM) methodology to assess the impact of different scheduling schemes on job execution times.  ...  We now discuss experiments based on a production level map-only MapReduce job ran on a 7-node 4-core per node cluster.  ... 
doi:10.1109/bigdata.2014.7004439 dblp:conf/bigdataconf/BardhanM14 fatcat:cnel7wbznbhhnlbyz4hplbnox4

STEAMEngine: Driving MapReduce provisioning in the cloud

Michael Cardosa, Piyush Narang, Abhishek Chandra, Himabindu Pucha, Aameek Singh
2011 2011 18th International Conference on High Performance Computing  
MapReduce has gained in popularity as a distributed data analysis paradigm, particularly in the cloud, where MapReduce jobs are run on virtual clusters.  ...  Our experimental results based on an Amazon EC2 cluster and a local 6-node Xen/Hadoop cluster show the benefits of STEAMEngine through improvements in performance and energy via the use of these algorithms  ...  Each node in a MapReduce cluster operates on equal sized blocks which contain similar content (records).  ... 
doi:10.1109/hipc.2011.6152649 dblp:conf/hipc/CardosaNCPS11 fatcat:2syypqwkdbadfcsz6k7qwhqffe
« Previous Showing results 1 — 15 out of 1,745 results