Filters








17 Hits in 7.8 sec

M3R: Increased performance for in-memory Hadoop jobs [article]

Avraham Shinnar, David Cunningham, Benjamin Herta, Vijay Saraswat
2012 arXiv   pre-print
M3R also supports extensions to the HMR API which can enable Map Reduce jobs to run faster on the M3R engine, while not affecting their performance under the Hadoop engine.  ...  In return, it can run HMR jobs unchanged -- including jobs produced by compilers for higher-level languages such as Pig, Jaql, and SystemML and interactive front-ends like IBM BigSheets -- while providing  ...  very good performance for jobs that can fit in the size of cluster memory.  ... 
arXiv:1208.4168v1 fatcat:lnsnqc2ak5adblsp4ojzctbfb4

M3R

Avraham Shinnar, David Cunningham, Vijay Saraswat, Benjamin Herta
2012 Proceedings of the VLDB Endowment  
M3R also supports extensions to the HMR API which can enable Map Reduce jobs to run faster on the M3R engine, while not affecting their performance under the Hadoop engine.  ...  In return, it can run HMR jobs unchanged -including jobs produced by compilers for higher-level languages such as Pig, Jaql, and SystemML and interactive front-ends like IBM BigSheets -while providing  ...  very good performance for jobs that can fit in the size of cluster memory.  ... 
doi:10.14778/2367502.2367513 fatcat:bnh6bmmorfdstgcbmczabbo5tu

Hone

K. Ashwin Kumar, Jonathan Gluck, Amol Deshpande, Jimmy Lin
2013 Proceedings of the VLDB Endowment  
The underlying assumption behind Hadoop and, more generally, the need for distributed processing is that the data to be analyzed cannot be held in memory on a single machine.  ...  Additionally, we are seeing increased sophistication in analytics, e.g., machine learning, which generally operates over smaller and more refined datasets.  ...  Namespace manager: This module manages memory assignment to enable data reading and writing for MapReduce jobs.  ... 
doi:10.14778/2536274.2536314 fatcat:fuj7i33ltzh7dfaqb73r3khama

A Semi-clustering Scheme for Large-Scale Graph Analysis on Hadoop [chapter]

Seungtae Hong, Youngsung Shin, Dong Hoon Choi, Heeseung Jo, Jae-woo Chang
2014 Lecture Notes in Electrical Engineering  
In this paper, we propose a semi-clustering scheme for largescale graph analysis such as PageRank algorithm on Hadoop and show that the proposed scheme is effective.  ...  As a result, there are a lot of research results in large-scale graph analysis on Hadoop.  ...  There are various kinds of parallel programming models that can be used for analyzing the large-scale graphs, for example Hadoop [1] , Pregel [2] , and M3R [3] .  ... 
doi:10.1007/978-3-642-40675-1_46 fatcat:it43m6erj5e2pmwbspoz34zqsy

Optimization of Real-World MapReduce Applications With Flame-MR: Practical Use Cases

Jorge Veiga, Roberto R. Exposito, Bruno Raffin, Juan Tourino
2018 IEEE Access  
This paper studies the use of Flame-MR, an in-memory processing architecture for MapReduce applications, to improve the performance of real-world use cases in a transparent way while keeping application  ...  Apache Hadoop is a widely used MapReduce framework for storing and processing large amounts of data.  ...  ACKNOWLEDGMENT The authors would like to thank Iván Cores for his contribution to the deployment of VELaSSCo, and also Pierre Neyron and Michael Mercier for their help in the use of the Grid'5000 platform  ... 
doi:10.1109/access.2018.2880842 fatcat:6gzwm3a5czdudnf2y56zonxzdi

FP-Hadoop: Efficient processing of skewed MapReduce jobs

Miguel Liroz-Gistau, Reza Akbarinia, Divyakant Agrawal, Patrick Valduriez
2016 Information Systems  
We achieved excellent performance gains compared to native Hadoop, e.g. more than 10 times in reduce time and 5 times in total execution time.  ...  Although these key-based frameworks have been praised for their high scalability and fault tolerance, they show poor performance in the case of data skew.  ...  M3R is very efficient, but can be used only for the applications in which intermediate key-values can fit in memory.  ... 
doi:10.1016/j.is.2016.03.008 fatcat:47s4osze5bh23gyofjxfeaam2a

A survey of large-scale analytical query processing in MapReduce

Christos Doulkeridis, Kjetil Nørvåg
2013 The VLDB journal  
This survey aims to review the state of the art in improving the performance of parallel query processing using MapReduce.  ...  However, despite its merits, MapReduce has evident performance limitations in miscellaneous analytical tasks, and this has given rise to a significant body of research that aim at improving its efficiency  ...  Acknowledgments We would like to thank the editors and the anonymous reviewers for their very helpful comments that have significantly improved this paper. The research of C.  ... 
doi:10.1007/s00778-013-0319-9 fatcat:3gkpguiwnre2jduhjssuqgydfq

PortHadoop: Support direct HPC data processing in Hadoop

Xi Yang, Ning Liu, Bo Feng, Xian-He Sun, Shujia Zhou
2015 2015 IEEE International Conference on Big Data (Big Data)  
In recent years, there is a growing interest in the High Performance Computing (HPC) community to use Hadoop-based tools for processing scientific data.  ...  PortHadoop keeps all the semantics in the original Hadoop system and PFS.  ...  This research is also supported in part by NSF under NSF grants CNS-0751200, CCF-0937877, and CNS-1162540.  ... 
doi:10.1109/bigdata.2015.7363759 dblp:conf/bigdataconf/YangLFSZ15 fatcat:7mubyjhbvfgmzlevy4x5x4c6dm

Flame-MR: An event-driven architecture for MapReduce applications

Jorge Veiga, Roberto R. Expósito, Guillermo L. Taboada, Juan Touriño
2016 Future generations computer systems  
As the data size managed by MapReduce applications is steadily increasing, the need for improving the Hadoop performance also grows.  ...  This paper proposes Flame-MR, a new event-driven MapReduce architecture that increases Hadoop performance by avoiding memory copies and pipelining data movements, without modifying the source code of the  ...  We thankfully acknowledge the Advanced School for Computing and Imaging (ASCI) and the Vrije University Amsterdam for providing access to the DAS-4 cluster.  ... 
doi:10.1016/j.future.2016.06.006 fatcat:dhjurssi2va3dgnpeseisxdcfq

Chabok: a Map-Reduce based method to solve data warehouse problems

Mohammadhossein Barkhordari, Mahdi Niamanesh
2018 Journal of Big Data  
In this method, aggregation is performed completely on Mappers, and intermediate results are sent to the Reducer. Chabok does not need data replication for join omission.  ...  The proposed method was implemented on Hadoop, and TPC-DS queries were executed for benchmarking. The query execution time on Chabok surpassed prominent big data products for data warehousing.  ...  The M3R method improves Hadoop performance by omitting portions such as Heart beat or Job Tracker.  ... 
doi:10.1186/s40537-018-0144-5 fatcat:hzfn5lgtcnczhk55imhlgdvaiy

Massively Parallel Databases and MapReduce Systems

Shivnath Babu
2012 Foundations and Trends in Databases  
Timely and cost-effective analytics over "big data" has emerged as a key ingredient for success in many businesses, scientific and engineering disciplines, and government endeavors.  ...  The need to convert this raw data into useful information has spawned considerable innovation in systems for large-scale data analytics, especially over the last decade.  ...  Memory-based extensions and improvements on current systems have also been proposed. M3R (Main Memory MapReduce) [175] is a framework that extends Hadoop for running MapReduce jobs in memory.  ... 
doi:10.1561/1900000036 fatcat:5moo66w5aneyppq3xpib4wtaam

Using MapReduce Streaming for Distributed Life Simulation on the Cloud

Atanas Radenski
2013 Advances in Artificial Life, ECAL 2013  
Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models 1 .  ...  We implement and empirically evaluate our algorithms' performance on Amazon's Elastic MR cloud.  ...  Thanks are due to the anonymous reviewers for their valuable comments and recommendations.  ... 
doi:10.7551/978-0-262-31709-2-ch043 dblp:conf/ecal/Radenski13 fatcat:ctuuqhazxbg3xkjwohb2pkyil4

Deca

Xuanhua Shi, Zhixiang Ke, Yongluan Zhou, Hai Jin, Lu Lu, Xiong Zhang, Ligang He, Zhenyu Hu, Fei Wang
2019 ACM Transactions on Computer Systems  
the similar performance comparing to domain specific systems.  ...  In-memory caching of intermediate data and active combining of data in shuffle buffers have been shown to be very effective in minimizing the re-computation and I/O cost in big data processing systems  ...  ACKNOWLEDGMENTS We thank the anonymous reviewers for their valuable comments on earlier versions of this paper, and we thank Alibaba Computing Platform team members for their support and collaboration.  ... 
doi:10.1145/3310361 fatcat:d5z767ar4rd6xdp4z4sxnpkefi

In-Memory Big Data Management and Processing: A Survey

Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Meihui Zhang
2015 IEEE Transactions on Knowledge and Data Engineering  
We also give a comprehensive presentation of important technology in memory management, and some key factors that need to be considered in order to achieve efficient in-memory data management and processing  ...  Some issues such as fault-tolerance and consistency are also more challenging to handle in in-memory environment.  ...  We would like to thank the anonymous reviewers, and also Bingsheng He, Eric Lo and Bogdan Marius Tudor, for their insightful comments and suggestions.  ... 
doi:10.1109/tkde.2015.2427795 fatcat:u7r3rtvhxbainfeazfduxcdwrm

FP-Hadoop: Efficient Processing of Skewed MapReduce Jobs FP-Hadoop: Efficient Processing of Skewed MapReduce Jobs

Miguel Liroz-Gistau, Reza Akbarinia, Divyakant Agrawal, Patrick Valduriez, Miguel Liroz-Gistau, Reza Akbarinia, Divyakant Agrawal, Patrick, Miguel Liroz-Gistau, Reza Akbarinia, Divyakant Agrawal, Patrick Valduriez
2016 Effi-cient Processing of Skewed MapReduce Jobs. Information Systems   unpublished
We achieved excellent performance gains compared to native Hadoop, e.g. more than 10 times in reduce time and 5 times in total execution time.  ...  Although these key-based frameworks have been praised for their high scalability and fault tolerance, they show poor performance in the case of data skew.  ...  M3R is very efficient, but can be used only for the applications in which intermediate key-values can fit in memory.  ... 
fatcat:xtjz2qfixzhzjlypumuidla2t4
« Previous Showing results 1 — 15 out of 17 results