Filters








137,249 Hits in 5.9 sec

Supporting Data Provenance in Data-Intensive Scalable Computing Systems

Matteo Interlandi, Tyson Condie
2018 IEEE Data Engineering Bulletin  
Debugging data processing logic in Data-Intensive Scalable Computing (DISC) systems is a difficult and time consuming effort.  ...  Introduction Data-Intensive Scalable Computing (DISC) systems, like Apache Hadoop [1] and Apache Spark [2] , are being used to analyze massive quantities of data.  ...  RAMP is more tightly integrated into the target DISC system (e.g., Hadoop MapReduce), providing better scalability, but, like Newt, the RAMP design lacks a unified solution for data and provenance analysis  ... 
dblp:journals/debu/InterlandiC17 fatcat:4m5p7lii5rb55cfxzxghoahsl4

dispel4py: A Python Framework for Data-Intensive Scientific Computing

Rosa Filguiera, Iraklis Klampanos, Amrey Krause, Mario David, Alexander Moreno, Malcolm Atkinson
2014 2014 International Workshop on Data Intensive Scalable Computing Systems  
This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications.  ...  The main aim of dispel4py is to enable scientists to focus on their computation instead of being distracted by details of the computing infrastructure they use.  ...  This work also used of the Open Science Data Cloud (OSDC) which is an Open Cloud Consortium (OCC)sponsored project.  ... 
doi:10.1109/discs.2014.12 dblp:conf/sc/FilguieraKKDMA14 fatcat:rw3x2og2ord4nhidkjyr3mu2za

GrayWulf: Scalable Clustered Architecture for Data Intensive Computing

Alexander S. Szalay, Gordon Bell, Jan vandenBerg, Alainna Wonders, Randal C. Burns, Dan Fay, Jim Heasley, Tony Hey, María A. Nieto-Santisteban, Ani Thakar, Catharine van Ingen, Richard Wilton
2009 2009 42nd Hawaii International Conference on System Sciences  
Data intensive computing presents novel challenges for traditional computing architectures that have focused on FLOPS.  ...  We present the architecture of a database cluster targeted at dataintensive computations with petascale data sets.  ...  Acknowledgements The authors would like to thank Jim Gray for many years of intense collaboration and friendship.  ... 
doi:10.1109/hicss.2009.234 dblp:conf/hicss/SzalayBvWBFHHNTIW09 fatcat:uff3uxfsfvhblmdj6jbqtyeudq

A Bloom Filter Based Scalable Data Integrity Check Tool for Large-Scale Dataset

Sisi Xiong, Feiyi Wang, Qing Cao
2016 2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)  
Large scale HPC applications are becoming increasingly data intensive.  ...  At Oak Ridge Leadership Computing Facility (OLCF), we are observing the number of files curated under individual project are reaching as high as 200 millions and project data size is exceeding petabytes  ...  ACKNOWLEDGMENT This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.  ... 
doi:10.1109/pdsw-discs.2016.014 dblp:conf/sc/XiongWC16 fatcat:bm7sevq5vjcy7nelrgkuo7zzhe

The quest for scalable support of data-intensive workloads in distributed systems

Ioan Raicu, Ian T. Foster, Yong Zhao, Philip Little, Christopher M. Moretti, Amitabh Chaudhary, Douglas Thain
2009 Proceedings of the 18th ACM international symposium on High performance distributed computing - HPDC '09  
Data-intensive applications involving the analysis of large datasets often require large amounts of compute and storage resources, for which data locality can be crucial to high throughput and performance  ...  We propose a "data diffusion" approach that acquires compute and storage resources dynamically, replicates data in response to demand, and schedules computations close to data.  ...  We also thank Alex Szalay for the contributions and ideas on the inception of data diffusion.  ... 
doi:10.1145/1551609.1551642 dblp:conf/hpdc/RaicuFZLMCT09 fatcat:vqz7isgmrzfetmq6l4mzlz43jq

Mapping of RAID Controller Performance Data to the Job History on Large Computing Systems

Marc Hartung, Michael Kluge
2014 2014 International Workshop on Data Intensive Scalable Computing Systems  
For systems executing a mixture of different data intensive applications in parallel there is always the question about the impact that each application has on the storage subsystem.  ...  This paper focuses on the analysis of performance data collected on shared system components like global file systems that can not be mapped back to user activities immediately.  ...  For this kind of data it is hard to correlate performance data with applications running on the computing system.  ... 
doi:10.1109/discs.2014.7 dblp:conf/sc/HartungK14 fatcat:nh4uvndia5fphfxdtqz5a2i2zu

CRUCIBLE

Peter Coetzee, Stephen Jarvis
2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems - DISCS-2013  
in data structure and semantics.  ...  The burgeoning field of data science benefits from the application of a variety of analytic models and techniques to the oft-cited problems of large volume, high velocity data rates, and significant variety  ...  Acknowledgments This work was funded under an Industrial EPSRC CASE Studentship, entitled "Platforms for Deploying Scalable Parallel Analytic Jobs over High Frequency Data Streams".  ... 
doi:10.1145/2534645.2534649 dblp:conf/sc/CoetzeeJ13 fatcat:ms55hxypqjbi5bnnyx26m7yai4

Scientific Workflows at DataWarp-Speed: Accelerated Data-Intensive Science Using NERSC's Burst Buffer

Andrey Ovsyannikov, Melissa Romanus, Brian Van Straalen, Gunther H. Weber, David Trebotich
2016 2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)  
To address this problem advanced memory hierarchies, such as burst buffers, have been proposed as intermediate layers between the compute nodes and the parallel file system.  ...  However, as these workflows become more complex, their generated data has grown at an unprecedented rate, making I/O constraints challenging.  ...  This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. DOE under Contract No.  ... 
doi:10.1109/pdsw-discs.2016.005 dblp:conf/sc/OvsyannikovRSWT16 fatcat:tihjnym4knedpk76yq3e5sicne

Distributed Multipath Routing Algorithm for Data Center Networks

Eun-Sung Jung, Venkatram Vishwanath, Rajkumar Kettimuthu
2014 2014 International Workshop on Data Intensive Scalable Computing Systems  
In this paper, we propose a scalable distributed flow scheduling algorithm that can exploit multiple paths in data center networks.  ...  The fast adoption of cloud computing for various applications including high-performance computing applications has drawn more attention to efficient network utilization through adaptive or multipath routing  ...  Department of Energy, Office of Science, Advanced Scientific Computing Research Program, under Contract DE-AC02-06CH11357.  ... 
doi:10.1109/discs.2014.14 dblp:conf/sc/JungVK14 fatcat:vvmthw3durccrd44nywi4lwfme

PSA: A Performance and Space-Aware Data Layout Scheme for Hybrid Parallel File Systems

Shuibing He, Yan Liu, Xian-He Sun
2014 2014 International Workshop on Data Intensive Scalable Computing Systems  
We have implemented PSA within OrangeFS, a popular parallel file system in the HPC domain.  ...  The underlying storage of hybrid parallel file systems (PFS) is composed of both SSD-based file servers (SServer) and HDD-based file servers (HServer).  ...  computing (HPC) domain.  ... 
doi:10.1109/discs.2014.10 dblp:conf/sc/HeLS14 fatcat:74gpqykygjcnthllhshv5yxa7q

Toward scalable monitoring on large-scale storage for software defined cyberinfrastructure

Arnab K. Paul, Steven Tuecke, Ryan Chard, Ali R. Butt, Kyle Chard, Ian Foster
2017 Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems - PDSW-DISCS '17  
We describe here an approach for scalable, lightweight, event detection on large (multi-petabyte) Lustre le systems.  ...  data management policies.  ...  ACKNOWLEDGMENTS This research used resources of the Argonne Leadership Computing Facility, which is a DOE O ce of Science User Facility supported under Contract DE-AC02-06CH11357.  ... 
doi:10.1145/3149393.3149402 dblp:conf/sc/PaulTCBCF17 fatcat:kcrtk6zgczekxies4vg3a3elwy

Methodology for the Rapid Development of Scalable HPC Data Services

Matthieu Dorier, Philip Carns, Kevin Harms, Robert Latham, Robert Ross, Shane Snyder, Justin Wozniak, Samuel Gutierrez, Bob Robey, Brad Settlemyer, Galen Shipman, Jerome Soumagne (+3 others)
2018 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS)  
There is growing evidence in the scientifc computing community that parallel file systems are not sufficient for all HPC storage workloads.  ...  This realization has motivated extensive research in new storage system designs.  ...  Specialized data services Specialized data services are already widespread in scientific computing as a means to augment parallel file system functionality.  ... 
doi:10.1109/pdsw-discs.2018.00013 dblp:conf/sc/DorierCHLRSWGRS18 fatcat:agaxisl4dvhunkai6lncj4f6oa

CULZSS-Bit: A Bit-Vector Algorithm for Lossless Data Compression on GPGPUs

Adnan Ozsoy
2014 2014 International Workshop on Data Intensive Scalable Computing Systems  
In this paper, we describe an algorithm to improve dictionary based lossless data compression on GPGPUs.  ...  The presented algorithm uses bit-wise computations and leverages bit parallelism for the core part of the algorithm which is the longest prefix match calculations.  ...  Martin Swany for their valuable insights and advice for applying bit-vector approach on lossless data compression.  ... 
doi:10.1109/discs.2014.9 dblp:conf/sc/Ozsoy14 fatcat:q2czmbd6lrgxrhfyif2t5agz2m

SDAFT

Jiangling Yin, Junyao Zhang, Jun Wang, Wu-chun Feng
2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems - DISCS-2013  
SDAFT is an adaptive, data locality-aware middleware system that employs a scalable distributed file system to supply parallel I/O and dynamically schedules compute processes to access local data by monitoring  ...  SDAFT employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches.  ...  Acknowledgments This material is based upon work supported by the National Science Foundation under the following NSF program: Parallel Reconfigurable Observational Environment for Data Intensive Super-Computing  ... 
doi:10.1145/2534645.2534647 dblp:conf/sc/YinZWF13 fatcat:nih6b4v2nrfpnhyuw6egisdeq4

A Generic Framework for Testing Parallel File Systems

Jinrui Cao, Simeng Wang, Dong Dai, Mai Zheng, Yong Chen
2016 2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)  
data loss.  ...  Recent studies on local storage systems have exposed various vulnerabilities that could lead to data loss under failure events, which raise the concern for parallel file systems built on top of them.  ...  Data-Intensive Workloads Data-intensive workloads are used to exercise the Lustre file system and generate I/O operations, which is necessary to age the system and bring it to a state that may be difficult  ... 
doi:10.1109/pdsw-discs.2016.013 dblp:conf/sc/CaoWDZC16 fatcat:uigtb6toi5f3fhoaaufhvlfje4
« Previous Showing results 1 — 15 out of 137,249 results