7 Hits in 4.0 sec

Dynamic Job Ordering and Slot Configurations for MapReduce Workloads

Shanjiang Tang, Bu-Sung Lee, Bingsheng He
2016 IEEE Transactions on Services Computing  
This paper proposes two classes of algorithms to minimize the makespan and the total completion time for an offline MapReduce workload.  ...  MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and data centers.  ...  [33] presented an I/O-efficient MapReduce system called Themis that improves the performance of MapReduce by minimizing the number of I/O operations.  ... 
doi:10.1109/tsc.2015.2426186 fatcat:wcvmhd63ubfofg3t5wtjoz7gka

Achieving cost-efficient, data-intensive computing in the cloud

Michael Conley, Amin Vahdat, George Porter
2015 Proceedings of the Sixth ACM Symposium on Cloud Computing - SoCC '15  
Themis MapReduce derives much of its I/O-efficiency from its pipelined implementation, which limits the amount of extraneous I/O relative to frameworks like Hadoop [39] .  ...  We now model the performance of Themis under several assumptions about I/O efficiency and data durability. 2-IO Because Themis eschews traditional task-level fault tolerance, it exhibits the 2-IO property  ... 
doi:10.1145/2806777.2806781 dblp:conf/cloud/ConleyVP15 fatcat:tfphsj7yfjfrtdfsuukjgxgtci

Efficient Mapreduce Workloads based on Slot Configuration and Job Ordering

B Susmitha, N Kalyani
2017 International Journal of Computer & Mathematical Sciences IJCMS   unpublished
Hadoop, is an open source execution of MapReduce, is established in vast bunches consisting many machines in organizations, for example, Instagram and Twitter.  ...  In these group and datacenter environments, MapReduce and Hadoop are employed for batch handling tasks assigned from several clients (i.e., MapReduce workloads).  ...  Rasmussen et al. presented an I/O-efficient MapReduce system called Themis that improves the performance of MapReduce by minimizing the number of I/O operations.  ... 

PortHadoop: Support direct HPC data processing in Hadoop

Xi Yang, Ning Liu, Bo Feng, Xian-He Sun, Shujia Zhou
2015 2015 IEEE International Conference on Big Data (Big Data)  
In this study, we propose PortHadoop, an enhanced Hadoop architecture that enables MapReduce applications reading data directly from HPC parallel file systems (PFS).  ...  The success of the Hadoop MapReduce programming model has greatly propelled research in big data analytics.  ...  Therefore, this procedure guarantees both data integrity and I/O efficiency. In summary, PortHadoop implements three split alignment strategies as shown in Figure 5 , to support split alignment.  ... 
doi:10.1109/bigdata.2015.7363759 dblp:conf/bigdataconf/YangLFSZ15 fatcat:7mubyjhbvfgmzlevy4x5x4c6dm

29th International Conference on Data Engineering [book of abstracts]

2013 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW)  
Then based on an novel cost model, we propose an I/O efficient strategy to evaluate SPARQL queries as quickly as possible, especially queries with solution modifiers specified, e.g., PROJECTION, ORDER  ...  WeD/10 EAGRE: Towards Scalable I/O Efficient SPARQL Query Evaluation on the Cloud Xiaofei Zhang, Lei Chen, Yongxin Tong (Hong Kong University of Science and Technology) Min Wang (HP Labs China) To  ... 
doi:10.1109/icdew.2013.6547409 fatcat:wadzpuh3b5htli4mgb4jreoika

Program book

2010 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)  
RIOT makes R programs I/O-efficient in a way transparent to users. It features a flexible array storage manager and an optimization engine suitable for statistical and numerical operations.  ...  I/O-Efficient Statistical Computing with RIOT Yi Zhang, Weiping Zhang, Jun Yang; Duke University, USA Statistical analysis of massive data is becoming indispensable to science, commerce, and society today  ...  SMDB'10 will be a one-day workshop where accepted papers are presented in an informal and interactive setting. Participation in the workshop is not limited to authors of accepted papers.  ... 
doi:10.1109/icdew.2010.5452773 fatcat:oyq2tujbvjfpxjlyixux5q57vu

A Framework for Integrating IoT Streaming Data from Multiple Sources

Quang Tu Doan
In detail, their index is on the segmented sections of an incoming records stream, which is stored in files for disk I/O efficiency.  ...  Pairwise document similarity in large collections with mapreduce.  ... 
doi:10.26181/17211713 fatcat:6ahlrck3t5gs3onejnojd72ray