266 Hits in 4.8 sec

Configuring a MapReduce Framework for Performance-Heterogeneous Clusters

Jessica Hartog, Renan Delvalle, Madhusudhan Govindaraju, Michael J. Lewis
2014 2014 IEEE International Congress on Big Data  
Popular frameworks supporting the effective MapReduce programming model for Big Data applications do not flexibly adapt to these environments.  ...  Our results suggest that frameworks should support finer grained sub-tasking and dynamic data partitioning when running on some performance-heterogeneous clusters.  ...  Finer-Grained Splits Figure 2 plots data for the same set of tests as Figure 1 , for the finest granularity of the initial data split.  ... 
doi:10.1109/bigdata.congress.2014.26 dblp:conf/bigdata/HartogDGL14 fatcat:7xkyijf3sjhtvms5evdvbjfdty

To Overlap or Not to Overlap: Optimizing Incremental MapReduce Computations for On-Demand Data Upload

Stefan Ene, Bogdan Nicolae, Alexandru Costan, Gabriel Antoniu
2014 2014 5th International Workshop on Data-Intensive Computing in the Clouds  
Our key finding shows that overlapping the transfer time with as many incremental computations as possible is not always efficient: a better solution is to wait for enough to fill the computational capacity  ...  We analyze the feasibility of incremental MapReduce approaches to advance the computation as much as possible during the data upload by using already transferred data to calculate intermediate results.  ...  To cope with the challenges posed by the need to process massive input data, a possibly better solution would consist in overlapping data transfer with data processing.  ... 
doi:10.1109/datacloud.2014.7 dblp:conf/sc/EneNCA14 fatcat:bqqhfzmz6fhljf5tk7rmgisyya

An improved partitioning mechanism for optimizing massive data analysis using MapReduce

Kenn Slagter, Ching-Hsien Hsu, Yeh-Ching Chung, Daqiang Zhang
2013 Journal of Supercomputing  
Big Data is difficult to work with and requires massively parallel software running on a large number of computers.  ...  MapReduce is a recent programming model that simplifies writing distributed applications that handle Big Data.  ...  their support and for their help on earlier drafts of this paper.  ... 
doi:10.1007/s11227-013-0924-9 fatcat:yaj6vvbwbzeproplaxdty3luai

Towards efficient resource provisioning in MapReduce

Peter P. Nghiem, Silvia M. Figueira
2016 Journal of Parallel and Distributed Computing  
In the era of Big Data, energy efficiency has become an important issue for the ubiquitous Hadoop MapReduce framework.  ...  for any workload running on Hadoop MapReduce.  ...  its momentum of popularity for in-memory processing of Big Data analytic applications with better sorting performance for large clusters.  ... 
doi:10.1016/j.jpdc.2016.04.001 fatcat:nr2ubgp2p5dm3fyfrv7cbux2r4

Efficient iterative processing in the SciDB parallel array engine

Emad Soroush, Magdalena Balazinska, Simon Krughoff, Andrew Connolly
2015 Proceedings of the 27th International Conference on Scientific and Statistical Database Management - SSDBM '15  
These engines efficiently support various types of operations, but none includes native support for iterative processing.  ...  Many scientific data-intensive applications perform iterative computations on array data. There exist multiple engines specialized for array processing.  ...  In many applications, it is often efficient to first process the low-resolution versions the data and use the result to speed-up the processing of finer-resolution versions.  ... 
doi:10.1145/2791347.2791362 dblp:conf/ssdbm/SoroushBKC15 fatcat:kjfqzmtdhvfjxevv6a4viujd34

RAMS: Real-time Anomaly Monitoring System

M. Vidhya Sri
2018 International Journal for Research in Applied Science and Engineering Technology  
big data to the entire Weibo or Twitter text stream with continuous horizontal extensibility.  ...  Microblog platforms have been exceptionally happening in the big data epoch due to its real-time dissipation of facts and figures.  ...  subpartitions. 3) Processing Data Module: To efficiently update graph structure, we introduced a hash based graph partitioning method to support find-grained and rapid update.  ... 
doi:10.22214/ijraset.2018.3612 fatcat:cxlm3v7lqfdmjauri5owzyvgzq

A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks [article]

Sanaa Hamid Mohamed, Taisir E.H. El-Gorashi, Jaafar M.H. Elmirghani
2019 arXiv   pre-print
However, the increasing traffic between and within the data centers that migrate, store, and process big data, is becoming a bottleneck that calls for enhanced infrastructures capable of reducing the congestion  ...  MapReduce and Hadoop thus introduce innovative, efficient, and accelerated intensive computations and analytics.  ...  All data are provided in full in the results section of this paper.  ... 
arXiv:1910.00731v1 fatcat:kvi3br4iwzg3bi7fifpgyly7m4

Apache Spark

Matei Zaharia, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, Ion Stoica, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng (+2 others)
2016 Communications of the ACM  
for new workloads; for example, MapReduce 4 supported batch processing, but Google also developed Dremel 13 Apache Spark: A Unified Engine for Big Data Processing key insights ˽ A simple programming model  ...  can capture streaming, batch, and interactive workloads and enable new applications that combine them. ˽ Apache Spark applications range from finance to scientific data processing and combine libraries  ...  Since its release in 2010, Spark has grown to be the most active open source project or big data processing, with more than 1,000 contributors.  ... 
doi:10.1145/2934664 fatcat:zqffhrnl4rhk5ayrdyv25aiyeq

A Survey of State Management in Big Data Processing Systems [article]

Quoc-Cuong To, Juan Soto, Volker Markl
2018 arXiv   pre-print
State management and its use in diverse applications varies widely across big data processing systems.  ...  Consequently, this presents a problem for big data frameworks with imperative machine learning algorithms, given that fine-grained access to large state is required.  ...  big data processing.  ... 
arXiv:1702.01596v4 fatcat:474aqppfpjhdrkslqrawrmnck4

Bridging the gap between applications and networks in data centers

Paolo Costa
2013 ACM SIGOPS Operating Systems Review  
A common property of MapReduce (and in general of "Big Data" applications) is that often data is aggregated during the process and the output size is a fraction of the input size.  ...  For example, this may include high-level domain specific languages such as Cloud Haskell [31] , as well as low-level languages like Click [44] , which allow for finer-grained control.  ... 
doi:10.1145/2433140.2433143 fatcat:ha26mahdkrd5nppw6nabmgqz3i

Benchmarking Big Data Systems: State-of-the-Art and Future Directions [article]

Rui Han, Zhen Jia, Wanling Gao, Xinhui Tian, Lei Wang
2015 arXiv   pre-print
This article investigates the state-of-the-art in benchmarking big data systems along with the future challenges to be addressed to realize a successful and efficient benchmark.  ...  The great prosperity of big data systems such as Hadoop in recent years makes the benchmarking of these systems become crucial for both research and industry communities.  ...  ACKNOWLEDGMENTS This technical report is a significant extended version of its preliminary version entitled "On Big Data Benchmarking", which is published in BPOE-4 (Co-located with ASPLOS 2014) [36]  ... 
arXiv:1506.01494v1 fatcat:3icae6wgjjfj7afsmlzppd4e2q

A platform for scalable one-pass analytics using MapReduce

Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, Prashant Shenoy
2011 Proceedings of the 2011 international conference on Management of data - SIGMOD '11  
Today's one-pass analytics applications tend to be data-intensive in nature and require the ability to process high volumes of data efficiently.  ...  MapReduce is a popular programming model for processing large datasets using a cluster of machines.  ...  into HDFS, which was very time-consuming. 4 For either real running time or modeled time cost, the 100 data points were interpolated into a finer-grained mesh.  ... 
doi:10.1145/1989323.1989426 dblp:conf/sigmod/LiMDMS11 fatcat:qwc5m7lylrdzfbsogn5lpfb2mu

A Survey on Automatic Parameter Tuning for Big Data Processing Systems

Herodotos Herodotou, Yuxing Chen, Jiaheng Lu
2020 ACM Computing Surveys  
for executing jobs in big data processing systems.  ...  Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression.  ...  [75] , Flink [3] , Samza [5] ), have emerged to assist with the Big Data challenge, i.e., to efficiently collect, process, and analyze massive volumes of heterogeneous data.  ... 
doi:10.1145/3381027 fatcat:7aglimtuwze25boptuano4ufdy

Early Accurate Results for Advanced Analytics on MapReduce [article]

Nikolay Laptev, Kai Zeng, Carlo Zaniolo
2012 arXiv   pre-print
Unfortunately, methods and tools for the computation of accurate early results are currently not supported in MapReduce-oriented systems although these are intended for 'big data'.  ...  Therefore, we proposed and implemented a non-parametric extension of Hadoop which allows the incremental computation of early results for arbitrary work-flows, along with reliable on-line estimates of  ...  We thank Mark Handcock for his inspiring guidance and Alexander Shkapsky for his helpful discussions. We also thank the anonymous reviewers of this paper for their valuable feedback.  ... 
arXiv:1207.0142v1 fatcat:dx2li6sb5zavxgwe2ptiz46qlm

Scalable RDF Data Compression using X10 [article]

Long Cheng, Avinash Malik, Spyros Kotoulas, Tomas E Ward, Georgios Theodoropoulos
2014 arXiv   pre-print
The Semantic Web comprises enormous volumes of semi-structured data elements. For interoperability, these elements are represented by long strings.  ...  A typical method for alleviating the impact of this problem is through the use of compression methods that produce more compact representations of the data.  ...  Although their evaluation on Hadoop has shown that their system is efficient and scales well, as we will show in Section VII, our approach is both faster and more flexible, exploiting the finer-grain control  ... 
arXiv:1403.2404v1 fatcat:isjn3vqu4rfybgv3pqpc36o7r4
« Previous Showing results 1 — 15 out of 266 results