A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Putting a "big-data" platform to good use
2012
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing - HPDC '12
Windows Server Cluster services: remote execution, naming, management Storage [5] Distributed execution [9] Compiler [11,14,16,17] Windows Server Windows Server Windows Server Libraries and frameworks ...
[1,6,7,13] Scheduling [10] Caching and data management [8,15] Monitoring, debugging [4,12] Applications [2,18] Figure 1: DryadLINQ software stack. ...
We have built a simple distributed filesystem, TidyFS [5] 2 . All cluster machines have local disks, providing persistent storage; these disks are aggregated by TidyFS into a global filesystem. ...
doi:10.1145/2287076.2287078
dblp:conf/hpdc/Budiu12
fatcat:aop475bvbvbbbcjbr6tmbb42dq
Pfimbi: Accelerating big data jobs through flow-controlled data replication
2016
2016 32nd Symposium on Mass Storage Systems and Technologies (MSST)
We demonstrate that for a job trace derived from a Facebook workload, Pfimbi improves the average job runtime by 18% and by up to 46% in the best case. ...
A key problem is the lack of flexibility in how data replication is performed. ...
Simbarashe Dzinamarira is also supported by a 2015/16 Schlumberger Graduate Fellowship. ...
doi:10.1109/msst.2016.7897074
dblp:conf/mss/DzinamariraDN16
fatcat:p7iiaaq6yvbozd24qfhxzbugnm
Of hammers and nails
2012
Proceedings of the fifth ACM international conference on Web search and data mining - WSDM '12
While relational databases are powerful and flexible tools that support a wide variety of computations, there are computations that benefit from using special-purpose storage systems and others that can ...
This paper presents an empirical study of computations on such large graphs in three well-studied platform models, viz., a relational model, a data-parallel model, and a special-purpose in-memory model ...
DryadLINQ DryadLINQ [39] 4 is a system for large scale distributed data-parallel computing using a high-level programming language. ...
doi:10.1145/2124295.2124310
dblp:conf/wsdm/NajorkFHKG12
fatcat:luyr2tg4nnb2fndi4oh6knffsa
Towards a high-performance scalable storage system for workflow applications
2012
Further a workflow-aware storage system can bring up to 3x performance gain compared to a vanilla distributed storage system that is unaware of the possible file-level optimizations. ...
Evaluation with synthetic and real workflow applications highlights the significant performance gain attainable by an intermediate storage system and a workflow-aware storage system. ...
The workflow-aware storage system may not bring performance gain for small files. ...
doi:10.14288/1.0073410
fatcat:yw7yzsn3crhitobppinfqawejy
Operating system support for warehouse-scale computing
2018
First, I introduce a reference model for a decentralised, distributed data centre OS, based on pervasive distributed objects and inspired by concepts in classic 1980s distributed OSes. ...
I present a novel distributed operating system for data centres, focusing on two OS components: the abstractions for resource naming, management and protection, and the scheduling of work to compute resources ...
-attributed to A. E. Housman [Ric41, p. 100]. ...
doi:10.17863/cam.26443
fatcat:lvxhwdcmlnbm7d7hg5xdrxqrsa