13 Hits in 4.3 sec

FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems

Dongfang Zhao, Zhao Zhang, Xiaobing Zhou, Tonglin Li, Ke Wang, Dries Kimpe, Philip Carns, Robert Ross, Ioan Raicu
2014 2014 IEEE International Conference on Big Data (Big Data)  
It thus is limited for modern data-intensive scientific applications because every I/O needs to be transferred via the network between the compute and storage resources.  ...  critical to the I/O performance of scientific applications.  ...  Department of Energy, Office of Science, Office of Advanced Scientific Computer Research, under contract number DE-AC02-06CH11357.  ... 
doi:10.1109/bigdata.2014.7004214 dblp:conf/bigdataconf/ZhaoZZLWKCRR14 fatcat:znfpquevxzagrl5dkllwi7z7aa

Distributed data provenance for large-scale data-intensive computing

Dongfang Zhao, Chen Shou, Tanu Maliky, Ioan Raicu
2013 2013 IEEE International Conference on Cluster Computing (CLUSTER)  
A key issue in evaluating the feasibility of data provenance is its performance, overheads, and scalability.  ...  It has become increasingly important to capture and understand the origins and derivation of data (its provenance).  ...  ACKNOWLEDGEME NT This work was supported by the National Science Foun dation under grant OCI-I0S4974, and used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which  ... 
doi:10.1109/cluster.2013.6702685 dblp:conf/cluster/ZhaoSMR13 fatcat:yxktbz7zu5gf7m2umseien36de

Making a case for distributed file systems at Exascale

Ioan Raicu, Ian T. Foster, Pete Beckman
2011 Proceedings of the third international workshop on Large-scale system and application performance - LSAP '11  
We propose that future high-end computing systems be designed with non-volatile memory on every compute node, allowing every compute node to actively participate in the metadata and data management and  ...  At exascale, basic functionality at high concurrency levels will suffer poor performance, and combined with system mean-time-to-failure in hours, will lead to a performance collapse for large-scale heroic  ...  ACKNOWLEDGMENTS This work was supported in part by the U.S. Dept. of Energy under Contract DE-AC02-06CH11357, as well as the National Science Foundation grant NSF-0937060 CIF-72 and NSF-1054974.  ... 
doi:10.1145/1996029.1996034 fatcat:bon3bizokzckhl5ajrzsqt7tqq

Overcoming Hadoop Scaling Limitations through Distributed Task Execution

Ke Wang, Ning Liu, Iman Sadooghi, Xi Yang, Xiaobing Zhou, Tonglin Li, Michael Lang, Xian-He Sun, Ioan Raicu
2015 2015 IEEE International Conference on Cluster Computing  
This paper aims to address the YARN scaling issues through a distributed task execution framework, MATRIX, which was originally designed to schedule the executions of data-intensive scientific applications  ...  of many-task computing on supercomputers.  ...  ACKNOWLEDGMENT This work was supported by the U.S. Department of Energy contract DE-FC02-06ER25750, and in part by the National Science Foundation (NSF) under awards CNS-1042537 and NSF-1054974.  ... 
doi:10.1109/cluster.2015.42 dblp:conf/cluster/WangLSYZLLSR15 fatcat:zb2wcgjrznh7vaq7ssisaqffgi

SciChain: Trustworthy Scientific Data Provenance [article]

Abdullah Al-Mamun, Dongfang Zhao
2020 arXiv   pre-print
The state-of-the-art for auditing and reproducing scientific applications on high-performance computing (HPC) systems is through a data provenance subsystem.  ...  This paper advocates to leverage blockchains to deliver immutable and autonomous data provenance services such that scientific data are trustworthy.  ...  ACKNOWLEDGEMENT This work is in part supported by a Google Cloud Platform Research Award. This work is also supported by the U.S. Department of Energy (DOE) under contract number DE-SC0020455.  ... 
arXiv:2002.00141v1 fatcat:hazf5vihrzazvb2zm2l56sozpy

Load-balanced and locality-aware scheduling for data-intensive workloads at extreme scales

Ke Wang, Kan Qiao, Iman Sadooghi, Xiaobing Zhou, Tonglin Li, Michael Lang, Ioan Raicu
2015 Concurrency and Computation  
Data driven programming models such as many-task computing (MTC) have been prevalent for running data-intensive scientific applications.  ...  Achieving distributed load balancing and best exploiting data-locality are two important goals for the best performance of distributed scheduling of data-intensive applications.  ...  These throughput numbers satisfy the scheduling needs of MTC data-intensive applications towards extreme scales.  ... 
doi:10.1002/cpe.3617 fatcat:ka2sslewpzbyjc4n3yuwq27gsi

ONFS: a hierarchical hybrid file system based on memory, SSD, and HDD for high performance computers

Xin Liu, Yu-tong Lu, Jie Yu, Peng-fei Wang, Jie-ting Wu, Ying Lu
2017 Frontiers of Information Technology & Electronic Engineering  
high performance computing (HPC) storage systems.  ...  With supercomputers developing towards exascale, the number of compute cores increases dramatically, making more complex and larger-scale applications possible.  ...  FusionFS is a distributed file system based on memory in compute nodes. It saves an extreme amount of data movement between compute and storage resources by storing data in memory.  ... 
doi:10.1631/fitee.1700626 fatcat:yqorojwvrvhtfluirqessfcgme


Ning Liu, Adnan Haider, Xian-He Sun, Dong Jin
2015 Proceedings of the 3rd ACM Conference on SIGSIM-Principles of Advanced Discrete Simulation - SIGSIM-PADS '15  
For extreme-scale computing systems like the data centers and supercomputers, the performance is highly dependent on the interconnection networks.  ...  Nowadays, high-performance computing (HPC) system designers are considering using fat-tree as the interconnection network for the next generation supercomputers.  ...  This also research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S.  ... 
doi:10.1145/2769458.2769474 dblp:conf/pads/LiuHSJ15 fatcat:avxa3xaje5d4pbi73ixa6wzk4i

D5.3: Updated Best Practices for HPC Procurement and Infrastructure

Andreas Johansson
2014 Zenodo  
Task 1 – Assessment of petascale systems – has performed a continuous market watch and analysis of trends in petascale HPC systems worldwide.  ...  -1IP WP8), which have all sought to reach informed decisions within PRACE as a whole on the acquisition and hosting of HPC systems and infrastructure.  ...  exascale computing infrastructures.  ... 
doi:10.5281/zenodo.6572433 fatcat:cwiqrgf33jajjjvhubky4f6rau


Hyogi Sim, Youngjae Kim, Sudharshan S. Vazhkudai, Geoffroy R. Vallée, Seung-Hwan Lim, Ali R. Butt
2017 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17  
In this paper, we present TagIt, a scalable data management service framework aimed at scientific datasets, which is tightly integrated into a shared-nothing distributed file system.  ...  A key feature of TagIt is a scalable, distributed metadata indexing framework, using which we implement a flexible tagging capability to support data discovery.  ...  Solutions such as change logs to automatically capture file system updates have significant performance impact and thus are often not deployed on extreme-scale storage systems.  ... 
doi:10.1145/3126908.3126929 dblp:conf/sc/SimKVVLB17 fatcat:db3r5tzh6veexgbr52qagli3le

Optimizing File System Techniques for Large-Scale Scientific Applications

Avery Ching
2007 unpublished
Optimizing File System Techniques for Large-Scale Scientific Applications Avery Ching High-performance scientific computing in a modern age uses parallel techniques at a scale of hundreds of thousands  ...  These large-scale applications have I/O system workloads that are primarily driven by small, sparse I/O operations.  ...  In summary, file systems that intend to provide fault-tolerant large-scale I/O efficiently for scientific computing must provide atomic high-performance I/O methods.  ... 
doi:10.21985/n2243n fatcat:tb35jhywzvggjlti2vdiz3dbwi


Peng Chen, David Crandall, Ryan Newton, Evans, Yuan Luo, Yu Luo, Milinda Pathirage, Zong Peng, Guangchen Ruan, Isuru Suriarachchi, Gabriel Zhou, Jiaan (+4 others)
2016 unpublished
Finally, it explores data mining on large collections of provenance and proposes a temporal representation of provenance that can reduce the high dimensionality while effectively supporting mining tasks  ...  It presents a stream processing framework for online processing of provenance data at high receiving rate.  ...  E-Science (or eScience) is data intensive, computationally-based science where computer science meets applications, and brings advances in both fields [75] .  ... 

Particle and cell manipulation in microfluidics: patterning and trapping using ultrasound

With the application of the physics of fluids at the micro-scale, microfluidics provides visual accessibility and high-resolution control for particle manipulation.  ...  This thesis develops efficient numerical modelling to study unexplored physical effects of resonance frequencies on inter-particle forces.  ...  I felt your support from the rst piece of communication we had. It has been a great privilege for me to have such a constant source of support.  ... 
doi:10.26180/5f6ddc11b7549 fatcat:bhxljafr7fcnvovxjggeekmkge