5 Hits in 1.4 sec

Putting a "big-data" platform to good use

Mihai Budiu
2012 Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing - HPDC '12  
Windows Server Cluster services: remote execution, naming, management Storage [5] Distributed execution [9] Compiler [11,14,16,17] Windows Server Windows Server Windows Server Libraries and frameworks  ...  [1,6,7,13] Scheduling [10] Caching and data management [8,15] Monitoring, debugging [4,12] Applications [2,18] Figure 1: DryadLINQ software stack.  ...  We have built a simple distributed filesystem, TidyFS [5] 2 . All cluster machines have local disks, providing persistent storage; these disks are aggregated by TidyFS into a global filesystem.  ... 
doi:10.1145/2287076.2287078 dblp:conf/hpdc/Budiu12 fatcat:aop475bvbvbbbcjbr6tmbb42dq

Pfimbi: Accelerating big data jobs through flow-controlled data replication

Simbarashe Dzinamarira, Florin Dinu, T. S. Eugene Ng
2016 2016 32nd Symposium on Mass Storage Systems and Technologies (MSST)  
We demonstrate that for a job trace derived from a Facebook workload, Pfimbi improves the average job runtime by 18% and by up to 46% in the best case.  ...  A key problem is the lack of flexibility in how data replication is performed.  ...  Simbarashe Dzinamarira is also supported by a 2015/16 Schlumberger Graduate Fellowship.  ... 
doi:10.1109/msst.2016.7897074 dblp:conf/mss/DzinamariraDN16 fatcat:p7iiaaq6yvbozd24qfhxzbugnm

Of hammers and nails

Marc Najork, Dennis Fetterly, Alan Halverson, Krishnaram Kenthapadi, Sreenivas Gollapudi
2012 Proceedings of the fifth ACM international conference on Web search and data mining - WSDM '12  
While relational databases are powerful and flexible tools that support a wide variety of computations, there are computations that benefit from using special-purpose storage systems and others that can  ...  This paper presents an empirical study of computations on such large graphs in three well-studied platform models, viz., a relational model, a data-parallel model, and a special-purpose in-memory model  ...  DryadLINQ DryadLINQ [39] 4 is a system for large scale distributed data-parallel computing using a high-level programming language.  ... 
doi:10.1145/2124295.2124310 dblp:conf/wsdm/NajorkFHKG12 fatcat:luyr2tg4nnb2fndi4oh6knffsa

Towards a high-performance scalable storage system for workflow applications

Emalayan Vairavanathan
Further a workflow-aware storage system can bring up to 3x performance gain compared to a vanilla distributed storage system that is unaware of the possible file-level optimizations.  ...  Evaluation with synthetic and real workflow applications highlights the significant performance gain attainable by an intermediate storage system and a workflow-aware storage system.  ...  The workflow-aware storage system may not bring performance gain for small files.  ... 
doi:10.14288/1.0073410 fatcat:yw7yzsn3crhitobppinfqawejy

Operating system support for warehouse-scale computing

Malte Schwarzkopf, Apollo-University Of Cambridge Repository, Apollo-University Of Cambridge Repository, Steven Hand, Ian Leslie, Robert N. M. Watson
First, I introduce a reference model for a decentralised, distributed data centre OS, based on pervasive distributed objects and inspired by concepts in classic 1980s distributed OSes.  ...  I present a novel distributed operating system for data centres, focusing on two OS components: the abstractions for resource naming, management and protection, and the scheduling of work to compute resources  ...  -attributed to A. E. Housman [Ric41, p. 100].  ... 
doi:10.17863/cam.26443 fatcat:lvxhwdcmlnbm7d7hg5xdrxqrsa