A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
H-WorD: Supporting Job Scheduling in Hadoop with Workload-Driven Data Redistribution
[chapter]
2016
Lecture Notes in Computer Science
We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the benefits of automatic data redistribution. ...
In this paper, we address the challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. ...
Workload-driven Redistribution of Data In this section, we tackle the previously discussed challenges, and present our algorithm for workload-driven redistribution of data, namely, H-WorD. ...
doi:10.1007/978-3-319-44039-2_21
fatcat:fs73k4otknhnhbj25wlvfhttwi
Accelerating MapReduce Analytics Using CometCloud
2012
2012 IEEE Fifth International Conference on Cloud Computing
Heterogeneity is also unavoidable in scientific applications that process a varying number of datasets of different sizes. In these cases, the performance of MapReduce-Hadoop can be a concern. ...
These resources can be selected from a hybrid infrastructure such as local clusters, data centers, and public clouds. ...
The research presented in this paper is supported in part by National Science Foundation via grants numbers IIP 0758566, CCF-0833039, DMS-0835436, CNS 0426354, IIS 0430826, and CNS 0723594, by Department ...
doi:10.1109/cloud.2012.150
dblp:conf/IEEEcloud/AbdelBakyKRP12
fatcat:oxqjtykasbffxlhgjgniojb6me
Resource Management for Dynamic MapReduce Clusters in Multicluster Systems
2012
2012 SC Companion: High Performance Computing, Networking Storage and Analysis
In this paper, we design and implement a resource management system to facilitate the on-demand isolated deployment of MapReduce clusters in multicluster systems. ...
To efficiently manage the underlying physical resources, we propose three provisioning policies for dynamically resizing MapReduce clusters, and we evaluate the performance of our system through experiments ...
and transient nodes in an MR cluster, and we run benchmarks to compare the resizing policies. ...
doi:10.1109/sc.companion.2012.151
dblp:conf/sc/GhitYE12
fatcat:3qm6l6cmnzdf5pg3nihcz5pyci
Hybrid Job-Driven Scheduling for Virtual MapReduce Clusters
2016
IEEE Transactions on Parallel and Distributed Systems
JoSS classifies MapReduce jobs based on job scale and job type and designs an appropriate scheduling policy to schedule each class of jobs. ...
To provide an appropriate scheduling scheme for this type of computing environment, we propose in this paper a hybrid job-driven scheduling scheme (JoSS for short) from a tenant's perspective. ...
In Hadoop, a MapReduce cluster consists of two masters called JobTracker [2] and Name-Node [2] and a set of slaves. ...
doi:10.1109/tpds.2015.2463817
fatcat:o4f44qz5fzadrjkbl23ghpo5li
DyScale: A MapReduce Job Scheduler for Heterogeneous Multicore Processors
2017
IEEE Transactions on Cloud Computing
MapReduce jobs, while offering improved throughput (up to 40 percent) for large, batch jobs. ...
The functionality of modern multi-core processors is often driven by a given power budget that requires designers to evaluate different decision trade-offs, e.g., to choose between many slow, power-efficient ...
We select 13 diverse MapReduce applications [2] to run experiments in our Hadoop cluster. The high level description of these applications is given in Table 2 . ...
doi:10.1109/tcc.2015.2415772
fatcat:ls62o7yt2vbulnpw5zfwzj5lui
MapReduce Scheduler: A 360-degree view
[article]
2017
arXiv
pre-print
In this paper, we provide in-depth insight on the MapReduce scheduling algorithm. ...
Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. ...
A novel job scheduling technique is proposed, Throughput driven [29] task scheduler for obtaining high system throughput in the jobintensive MapReduce environment. ...
arXiv:1704.02632v1
fatcat:44kf7vncjjha7paeilhyuupj5a
Performance-driven task co-scheduling for MapReduce environments
2010
2010 IEEE Network Operations and Management Symposium - NOMS 2010
In this paper, we address this problem by introducing a new task scheduler for a MapReduce framework that allows performance-driven management of MapReduce tasks. ...
MapReduce is a data-driven programming model proposed by Google in 2004 which is especially well suited for distributed data analytics applications. ...
Task selection and slave node assignment govern a job's opportunity to make progress, and thus influences job performance. ...
doi:10.1109/noms.2010.5488494
dblp:conf/noms/PoloCBSW10
fatcat:2r4ljikgonb23nv4vurgqvll7e
Heterogeneous cores for MapReduce processing: Opportunity or challenge?
2014
2014 IEEE Network Operations and Management Symposium (NOMS)
., to choose between many slow cores, fewer faster cores, or to select a combination of them. ...
In this work, we design a new Hadoop scheduler, called DyScale, that exploits capabilities offered by heterogeneous cores for achieving a variety of performance objectives. ...
Job scheduling in Hadoop is performed by a master node called the JobTracker, which manages a number of worker nodes in the cluster. ...
doi:10.1109/noms.2014.6838339
dblp:conf/noms/YanCZS14
fatcat:jliihgk4ozbl3il4slco4r5ihe
Minimizing Remote Accesses in MapReduce Clusters
2013
2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum
MapReduce, in particular Hadoop, is a popular framework for the distributed processing of large datasets on clusters of relatively inexpensive servers. ...
We find that, in an unconstrained cluster where a job's map tasks may be scheduled dynamically on any node over time, Hadoop's default random data placement is effective in avoiding remote accesses. ...
In contrast to the authors' approach, we focus on how intelligent data placement can be used to maximize MapReduce efficiency in scenarios where node allocations are restricted. ...
doi:10.1109/ipdpsw.2013.195
dblp:conf/ipps/TandonCW13
fatcat:3c3yo5aqevcpbpuagvgp3udnqi
Governing energy consumption in Hadoop through CPU frequency scaling: An analysis
2016
Future generations computer systems
In this paper, we focus on MapReduce processing and we investigate the impact of dynamically scaling the frequency of compute nodes on the performance and energy consumption of a Hadoop cluster. ...
Most large-scale data computations in the cloud heavily rely on the MapReduce paradigm and on its Hadoop implementation. ...
tuning policy -when relying on static MapReduce phase detection, similar to [
Dynamic tuning policies We study two coarse-grained frequency tuning policies: • Dynamic CPU frequency. ...
doi:10.1016/j.future.2015.01.005
fatcat:tr7ld7ykkjcypgpt4eyr4accmm
Factors affecting cloud data-center efficiency: a scheduling algorithm-based analysis
2021
International Journal of Advanced Technology and Engineering Exploration
Google developed the MapReduce programming paradigm to counter this problem, which served as the foundation for Apache's open-source Hadoop project. ...
In addition, many of these kinds of platforms must-have features like parallel processing, fault tolerance, data dissemination, scalability, availability, and load balancing. ...
[58] proposed a load-driven Dynamic Slot Controller (DSC) technique that modifies the map and reduces task slots in response to the slave nodes' workload. ...
doi:10.19101/ijatee.2021.874313
fatcat:2bwdcxpac5ccddxdcgpshrrbye
Rough Sets Base Associative Classification Rules Extraction from Big Data
2019
VOLUME-8 ISSUE-10, AUGUST 2019, REGULAR ISSUE
But in our literature review, we find that for the MapReduce system there is an absolute lack of rough setbased technique. ...
Many conventional data analytics techniques have been extended to the MapReduce framework to process Big Data. ...
In conflicting data, RST-driven rule generation system may generate minor and non-redundant rule sets. ...
doi:10.35940/ijitee.a9140.119119
fatcat:p6sn2hndyrfhbdsnwjauybdpba
Towards a data-centric view of cloud security
2010
Proceedings of the second international workshop on Cloud data management - CloudDB '10
In particular, we explore the security properties of secure data sharing between applications hosted in the cloud. ...
In this paper, we take an alternative perspective and propose a data-centric view of cloud security. ...
In the Filtering Phase, the master node distributes 6,400 randomly selected webpages from the Stanford Web-Base project [2] to map workers. ...
doi:10.1145/1871929.1871934
dblp:conf/cikm/ZhouSMZTLL10
fatcat:jajge32swnhkhks2fykd5hrb7i
A contention aware hybrid evaluator for schedulers of big data applications in computer clusters
2014
2014 IEEE International Conference on Big Data (Big Data)
This method is then implemented in Mumak, a popular Hadoop job-trace simulator making it contention-aware. ...
This paper presents a Trace Driven Analytic Model (TDAM) methodology to assess the impact of different scheduling schemes on job execution times. ...
We now discuss experiments based on a production level map-only MapReduce job ran on a 7-node 4-core per node cluster. ...
doi:10.1109/bigdata.2014.7004439
dblp:conf/bigdataconf/BardhanM14
fatcat:cnel7wbznbhhnlbyz4hplbnox4
STEAMEngine: Driving MapReduce provisioning in the cloud
2011
2011 18th International Conference on High Performance Computing
MapReduce has gained in popularity as a distributed data analysis paradigm, particularly in the cloud, where MapReduce jobs are run on virtual clusters. ...
Our experimental results based on an Amazon EC2 cluster and a local 6-node Xen/Hadoop cluster show the benefits of STEAMEngine through improvements in performance and energy via the use of these algorithms ...
Each node in a MapReduce cluster operates on equal sized blocks which contain similar content (records). ...
doi:10.1109/hipc.2011.6152649
dblp:conf/hipc/CardosaNCPS11
fatcat:2syypqwkdbadfcsz6k7qwhqffe
« Previous
Showing results 1 — 15 out of 1,745 results