666 Hits in 4.4 sec

Optimising Bootstrapping Algorithms Using R and Hadoop

Shicai Wang, Mihaela A. Mares, Yike Guo
2015 2015 IEEE 35th International Conference on Distributed Computing Systems Workshops  
Therefore there is a demand for a large number of algorithm runs on several data replicates, and with the expected increase in dataset sizes, high performance parallel optimisation becomes mandatory.  ...  We conclude that R on HDFS holds great promise for methods based on resampling or bootstrapping, in particular when the number of dataset replications decreases the algorithm error, such as we demonstrated  ...  The the pre-schduled method saved 35.57% time of the basic method. HDFS is applied to the fine-grained 300-sample test, compared to the EXT4 based fine-grained 300-sample test.  ... 
doi:10.1109/icdcsw.2015.34 dblp:conf/icdcsw/WangMG15 fatcat:vw6z3ukhjva6hnbdgqmxgdlieq

IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems

Bo Feng, Xi Yang, Kun Feng, Yanlong Yin, Xian-He Sun
2015 2015 IEEE International Conference on Cluster Computing  
This method can not only help to diagnose system bottlenecks but also further optimize performance.  ...  To achieve this goal, we propose a transparent tracing and analysis tool suite, namely IOSIG+, which can be plugged into Hadoop system.  ...  This work is supported in part by a research grant from Huawei Technologies Co, Ltd., the US National Science Foundation under Grant No. CNS-0751200, CCF-0937877, CNS-1162540, and CNS-1338078.  ... 
doi:10.1109/cluster.2015.17 dblp:conf/cluster/FengYFYS15 fatcat:fwsuiikjtbb2xcprkfjsrvcaxq


Wing Lung Ngai, Tim Hegeman, Stijn Heldens, Alexandru Iosup
2017 Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems - GRADES'17  
In this work, we propose Granula, a performance analysis system for Big Data platforms that focuses on graph processing.  ...  It offers a comprehensive evaluation process that can be iteratively tuned to deliver more fine-grained performance information.  ...  , and the costly, fine-grained analysis; in analyzing [25] Apache -Java yes Yarn MapRed Out-of-core HDFS data-processing end-to-end; and in sharing performance results for the entire community of analysts  ... 
doi:10.1145/3078447.3078455 dblp:conf/grades/NgaiHHI17 fatcat:y6lxqnhbsrgzjdronauexqtroy

The Hadoop distributed filesystem: Balancing portability and performance

Jeffrey Shafer, Scott Rixner, Alan L Cox
2010 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS)  
Hadoop is a popular open-source implementation of MapReduce for the analysis of large datasets. To manage storage resources across the cluster, Hadoop uses a distributed user-level filesystem.  ...  This paper analyzes the performance of HDFS and uncovers several performance issues.  ...  In such workloads, storage latency is of equal importance to storage bandwidth; thus, fine-grained fairness is provided at a small granularity (a few hundred kilobytes or less).  ... 
doi:10.1109/ispass.2010.5452045 dblp:conf/ispass/ShaferRC10 fatcat:zh7rc246aje5vopzrhxe3vjbbi

Zput: A speedy data uploading approach for the Hadoop Distributed File System

Youwei Wang, Weiping Wang, Can Ma, Dan Meng
2013 2013 IEEE International Conference on Cluster Computing (CLUSTER)  
Hadoop Distributed File System (HDFS) is the storage component of the Hadoop framework, which is de signed for maintaining and processing huge datasets efficiently among cluster nodes.  ...  This primary contribution of this paper is the proposition of Zput, a speedy data uploading mechanism which can significantly accelerate uploading by using metadata mapping approach.  ...  more efficient algorithms; III. using more flexible and fine-grained placement algorithm to colo cate data and computation to boost the query performance.  ... 
doi:10.1109/cluster.2013.6702648 dblp:conf/cluster/WangWMM13 fatcat:5q3cfq55xzg2vl543iaxzpzh4i

A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing

Zhixin Li, Dandan Su, Haijiang Zhu, Wei Li, Fan Zhang, Ruirui Li
2017 Sensors  
Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased.  ...  Although the time-domain SAR raw data simulation algorithm has been improved for smaller time complexity, the optimization still does not achieve satisfactory performance.  ...  data simulation can be divided into a coarse-grained strategy and a fine-grained one, as shown in Figure 1 .  ... 
doi:10.3390/s17010113 pmid:28075343 pmcid:PMC5298686 fatcat:vxpxeftrivd4tpzem64w7zkcke

Toward scalable internet traffic measurement and analysis with Hadoop

Yeonhee Lee, Youngseok Lee
2012 Computer communication review  
From experiments with a 200-node testbed, we achieved 14 Gbps throughput for 5 TB files with IP and HTTP-layer analysis MapReduce jobs.  ...  In this paper, we present a Hadoop-based traffic monitoring system that performs IP, TCP, HTTP, and NetFlow analysis of multi-terabytes of Internet traffic in a scalable manner.  ...  Acknowledgment We thank the anonymous reviewers for their helpful feedback and Sue Moon for her kind guidance in developing the final draft.  ... 
doi:10.1145/2427036.2427038 fatcat:43elfcm5kbdbbojjvj7ljevwmm

A virtual shared metadata storage for HDFS

Jiang Zhou, Yong Chen, Xiaoyan Gu, Weiping Wang, Dan Meng
2015 2015 IEEE International Conference on Networking, Architecture and Storage (NAS)  
Two strategies, a journal synchronization based on the 2PC protocol and a fine-grained image replication, are introduced in the VSSP according to different metadata access features.  ...  A distributed file system HDFS is implemented to provide high-throughput access to datasets. HDFS can achieve high performance metadata service but has two disadvantages.  ...  As HDFS uses a single metadata server, it has the performance bottleneck issue with the increase of cluster scale [5] , [6] .  ... 
doi:10.1109/nas.2015.7255195 dblp:conf/nas/ZhouCGWM15 fatcat:otndhyex5bashdimmsgz3wspze

The Case for Limping-Hardware Tolerant Clouds

Thanh Do, Haryadi S. Gunawi
2013 USENIX Workshop on Hot Topics in Cloud Computing  
This era is confronted with a new challenge: performance variability, primarily caused by large-scale management issues such as hardware failures, software bugs, and configuration mistakes.  ...  In this paper, we highlight one overlooked cause: limping hardware -hardware whose performance degrades significantly compared to its specification.  ...  The experiments in this paper were performed in the Utah Emulab network testbed [2] .  ... 
dblp:conf/hotcloud/DoG13 fatcat:enmgzpyizjaulnvvr4gwhn4yzq

Data Deduplication Technology for Cloud Storage

2020 Tehnički Vjesnik  
In this paper, we design a file deduplication framework on Hadoop distributed file system for cloud application developer.  ...  In the end of the paper, we test the disk utilisation and the file upload performance on RFD-HDFS and FD-HDFS, and compare HDFS with the disk utilisation of two system frameworks.  ...  More fine-grained deduplication creates more space storage opportunities and causes a significant performance impact.  ... 
doi:10.17559/tv-20200520034015 fatcat:7nux46ma5fgmvapqcq6vlhynd4

wPerf: Generic Off-CPU Analysis to Identify Bottleneck Waiting Events

Fang Zhou, Yifan Gan, Sixiang Ma, Yang Wang
2018 USENIX Symposium on Operating Systems Design and Implementation  
Acknowledgements Many thanks to our shepherd Cristiano Giuffrida and to the anonymous reviewers for their insightful comments.  ...  I/Os; for load imbalance, one could consider fine-grained task scheduling; for lock contention, one could consider fine-grained locking.  ...  While there are systematic solutions for on-CPU analysis (e.g., Critical Path Analysis [40] and COZ [16] ), existing off-CPU analysis methods are either inaccurate or incomplete.  ... 
dblp:conf/osdi/ZhouGMW18 fatcat:nk246l4ptzg4lgk7j33cpu24xa

Clash of the titans

Juwei Shi, Yunjie Qiu, Umar Farooq Minhas, Limei Jiao, Chen Wang, Berthold Reinwald, Fatma Özcan
2015 Proceedings of the VLDB Endowment  
; (2) We provide a break-down of the task execution time for in-depth analysis.  ...  To conduct a detailed analysis, we developed two profiling tools: (1) We correlate the task execution plan with the resource utilization for both MapReduce and Spark, and visually present this correlation  ...  Fine-grained Time Break-down: To understand where time goes for the shuffle component, we provide the fine-grained execution time break-down for selected tasks.  ... 
doi:10.14778/2831360.2831365 fatcat:gxxhkrlj4naabjgniqmeao3dci

Workload characterization on a production Hadoop cluster: A case study on Taobao

Zujie Ren, Xianghua Xu, Jian Wan, Weisong Shi, Min Zhou
2012 2012 IEEE International Symposium on Workload Characterization (IISWC)  
In addition, we use these job analysis statistics to derive several implications for potential performance optimization solutions.  ...  MapReduce is becoming the state-of-the-art computing paradigm for processing large-scale datasets on a large cluster with tens or thousands of nodes.  ...  We are grateful to Jianying, Yunzheng, Wuwei, Tuhai, Zeyuan for their insightful suggestions.  ... 
doi:10.1109/iiswc.2012.6402895 dblp:conf/iiswc/RenXWSZ12 fatcat:khnayz47mvclvo32vluqg2ffdu

Next-Generation Big Data Federation Access Control: A Reference Model [article]

Feras M. Awaysheh, Mamoun Alazab, Maanak Gupta, Tomás F. Pena, José C. Cabaleiro
2019 arXiv   pre-print
The efficiency of the proposed access broker has not sustainably affected the performance overhead. The experimental results show only 1\% of each 100 MB read/write operation in a WebHDFS.  ...  Overall, the findings of the paper pave the way for a wide range of revolutionary and state-of-the-art enhancements and future trends within Hadoop stack security and privacy.  ...  Ulusoy et al. proposed an approach for fine-grained access control authorization for MapReduce systems in [31, 32] .  ... 
arXiv:1912.11588v1 fatcat:3cot2oog6jedbilw2it3w7m47i


Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, Ion Stoica
2013 Proceedings of the 2013 international conference on Management of data - SIGMOD '13  
Unlike previous systems, Shark shows that it is possible to achieve these speedups while retaining a MapReduce-like execution engine, and the fine-grained fault tolerance properties that such engines provide  ...  Shark is a new data analysis system that marries query processing with complex analytics on large clusters.  ...  Third, the complexity of data analysis has also grown: modern data analysis employs sophisticated statistical methods, such as machine learning algorithms, that go well beyond the roll-up and drill-down  ... 
doi:10.1145/2463676.2465288 dblp:conf/sigmod/XinRZFSS13 fatcat:qs4bvu7habd77g42mtm3m5sgoy
« Previous Showing results 1 — 15 out of 666 results