Filters








691 Hits in 7.2 sec

Cloud resource management: towards efficient execution of large-scale scientific applications and workflows on complex infrastructures

Nelson Mimura Gonzalez, Tereza Cristina Melo de Brito Carvalho, Charles Christian Miers
2017 Journal of Cloud Computing: Advances, Systems and Applications  
Workflows have emerged as a way to formalize and structure data analysis, thus becoming an increasingly popular paradigm for scientists to handle complex scientific processes.  ...  multicloud environments to support large-scale execution of workflows, performance fluctuations, and reliability, pose as challenges to truly position clouds as viable high-performance infrastructures for  ...  The I/O cost in terms of storage and time to implement checkpointing are far from negligible.  ... 
doi:10.1186/s13677-017-0081-4 fatcat:oy36rd3zerc2rfx7rhpdqxvkte

Benchmarking Parallel I/O Performance for a Large Scale Scientific Application on the TeraGrid [chapter]

Frank Löffler, Jian Tao, Gabrielle Allen, Erik Schnetter
2010 Lecture Notes in Computer Science  
This limitation occurs at a low percentage of the computational size of the machines, which shows that at least for the application used for this paper the I/O system can be an important limiting factor  ...  It is seen that the I/O performance of our production code scales very well, but is limited by the I/O system itself at some point.  ...  Thanks also has to go to the system level experts from each of the three TeraGrid centers used for this work for their support and advice: Ariel Martinez, Jr. for QueenBee, Yaakoub El Khamra for Ranger  ... 
doi:10.1007/978-3-642-11842-5_37 fatcat:nmgfu6vtebd5jdkxj2qf3ozi6u

24/7 Characterization of petascale I/O workloads

Philip Carns, Robert Latham, Robert Ross, Kamil Iskra, Samuel Lang, Katherine Riley
2009 2009 IEEE International Conference on Cluster Computing and Workshops  
In this work we demonstrate Darshan's ability to characterize the I/O behavior of four scientific applications and show that it induces negligible overhead for I/O intensive jobs with as many as 65,536  ...  Tools that can help users better understand the behavior of their application with respect to I/O have not yet reached the level of utility necessary to play a central role in application development and  ...  , and the Applied Numerical Algorithms Group at NERSC for providing the Chombo I/O benchmark.  ... 
doi:10.1109/clustr.2009.5289150 dblp:conf/cluster/CarnsLRILR09 fatcat:kvugdje7czggzp52ovckowcf2q

Author index

2012 2012 19th International Conference on High Performance Computing  
Multi-Gateway System Foerster, Kyle Password Recovery Using MPI and CUDA Ghoshal, Devarshi Visualization of Network Data Provenance Top Gopalan, Sajith I/O Efficient QR and QZ Algorithms Grey, Ryan Massively  ...  stores Indarapu, Siva Rama Krishna Bharadwaj Sparse Matrix-Matrix Multiplication on Modern Architectures Ionkov, Latchesar The Design and Implementation of a Multi-level Content-Addressable Checkpoint  ... 
doi:10.1109/hipc.2012.6507473 fatcat:7k6al4ozjbecrjykky7kyd5b7e

Coordinating System Software for Power Savings

Lingxiang Xiang, Jiangwei Huang, Tianzhou Chen
2008 2008 Second International Conference on Future Generation Communication and Networking  
toward file-grain power optimizations.  ...  Thus, the optimizations working at distinct levels can be overlaid at run time, and the power reduction effect can be enhanced.  ...  Our system for energy efficient I/O is illustrated in In each application, the code regions are partitioned statically by the compiler according to different I/O behaviors at compile stage.  ... 
doi:10.1109/fgcn.2008.127 dblp:conf/fgcn/XiangHC08 fatcat:2kecyuhchvgfhhl7zpezjpqzce

The Landscape of Exascale Research

Stijn Heldens, Pieter Hijma, Ben Van Werkhoven, Jason Maassen, Adam S. Z. Belloum, Rob V. Van Nieuwpoort
2020 ACM Computing Surveys  
Overall, we observe that great advancements have been made in tackling the two primary exascale challenges: energy efficiency and fault tolerance.  ...  [159] consider non-blocking checkpointing (i.e., overlapping I/O with computation) and find high efficiency even when I/O bandwidth is limited. Dong et al.  ...  [46] optimize multi-level checkpointing for performance and energy-efficiency and find that optimizing both metrics simultaneously is not possible. Discussion .  ... 
doi:10.1145/3372390 fatcat:jhtwt7pxd5c5darhz75hiqgsnq

Author Index

2008 2008 IEEE International Symposium on Parallel and Distributed Processing  
Workloads on Heterogeneous Platforms under Bounded Multi-Port Model Fahey, Mark I/O Performance on a Massively Parallel Cray XT3/XT4 Fan, Jie Fault Tolerant Practices on Network Processors for Dependable  ...  Optimization of Parallel I/O on the Cray XT Vialle, Stéphane Large Scale Distribution of Stochastic Control Algorithms for Gas Storage Valuation Vikram, Krishna Partial Run-time Reconfiguration of FPGA  ... 
doi:10.1109/ipdps.2008.4536576 fatcat:7unikf5ywjhjtdd6xtrmcom3gq

2021 Index IEEE Transactions on Parallel and Distributed Systems Vol. 32

2022 IEEE Transactions on Parallel and Distributed Systems  
Sun, W., +, TPDS Nov. 2021 2623-2626 ETICA: Efficient Two-Level I/O Caching Architecture for Virtualized Platforms.  ...  Ahmadi, A., +, TPDS June 2021 1452-1464 Linux A Thread Level SLO-Aware I/O Framework for Embedded Virtualization.  ... 
doi:10.1109/tpds.2021.3107121 fatcat:e7bh2xssazdrjcpgn64mqh4hb4

Support for adaptivity in ARMCI using migratable objects

Chao Huang, Chee Wai Lee, L.V. Kale
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
For example, Charm++ supports dynamic load balancing via an intelligent adaptive run-time system.  ...  This paper presents our preliminary work on integrating and supporting ARMCI with the adaptive run-time system of Charm++ as a part of our overall effort in the multi-paradigm approach.  ...  The total amount of data varies, but we observe that the total bandwidth for disk I/O does not scale.  ... 
doi:10.1109/ipdps.2006.1639720 dblp:conf/ipps/HuangLK06 fatcat:td5d4k5kpbc23dfujyetkvvgy4

Application monitoring and checkpointing in HPC

William M. Jones, John T. Daly, Nathan DeBardeleben
2012 Proceedings of the 50th Annual Southeast Regional Conference on - ACM-SE '12  
One commonly used mechanism for providing application fault tolerance in parallel systems is the use of checkpointing.  ...  We demonstrate the impact of sub-optimal checkpoint intervals on application efficiency via simulation with real workload data.  ...  In our simulation, each time a job is dispatched, its checkpoint interval is set according to Equation 1, where δ is set to ten minutes, a reasonable assumption given to total RAM, disk I/O, and network  ... 
doi:10.1145/2184512.2184574 dblp:conf/ACMse/JonesDD12 fatcat:xfu3ppbqgrhovn3whysjljirmu

2020 Index IEEE Transactions on Parallel and Distributed Systems Vol. 31

2021 IEEE Transactions on Parallel and Distributed Systems  
., +, TPDS Jan. 2019 2-14 An I/O Efficient Distributed Approximation Framework Using Cluster Sampling.  ...  ., +, TPDS March 2019 692-709 Towards Efficient Multi-Channel Data Broadcast for Multimedia Streams.  ... 
doi:10.1109/tpds.2020.3033655 fatcat:cpeatdjlpzhqdersvsk5nmzjkm

Making a case for distributed file systems at Exascale

Ioan Raicu, Ian T. Foster, Pete Beckman
2011 Proceedings of the third international workshop on Large-scale system and application performance - LSAP '11  
At exascale, basic functionality at high concurrency levels will suffer poor performance, and combined with system mean-time-to-failure in hours, will lead to a performance collapse for large-scale heroic  ...  We want to thank our collaborators for the valuable help, feedback, and insight leading up to this work, namely Mike Wilde, Matei Ripeanu, Arthur Barney Maccabe, Marc Snir, Rob Ross, Kamil Iskra, and Alok  ...  Studies into methods to deal with small, unaligned I/O and mixed-size I/O workloads as well as collaborative caching are also needed.  ... 
doi:10.1145/1996029.1996034 fatcat:bon3bizokzckhl5ajrzsqt7tqq

Runtime Aware Architectures

Mateo Valero Cortes
2018 Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation - SIGSIM-PADS '18  
In the paper, we introduce a first approach towards a Runtime-Aware Architecture (RAA), a massively parallel architecture designed from the runtime's perspective.  ...  Current multi-cores are designed as simple symmetric multiprocessors (SMP) on a chip. However, we believe that this is not enough to overcome all the problems that multi-cores face.  ...  Performance can also improve if the pipeline can be extended to asynchronously execute sequential I/O intensive regions.  ... 
doi:10.1145/3200921.3204479 dblp:conf/pads/Cortes18 fatcat:ctgvsceil5cgxpba7hhoy5f3ae

A Survey on Resiliency Techniques in Cloud Computing Infrastructures and Applications

Carlos Colman-Meixner, Chris Develder, Massimo Tornatore, Biswanath Mukherjee
2016 IEEE Communications Surveys and Tutorials  
., US$ 25.5 billion in 2010 for North America).  ...  The second major part of the paper introduces and categorizes a large number of techniques for cloud computing infrastructure resiliency.  ...  An approach using this technique is checkpointing orchestration (CO) suggested in [163] that minimizes the I/O contention of concurrent checkpoint data from distributed and large cloud infrastructure  ... 
doi:10.1109/comst.2016.2531104 fatcat:vzvkai7nkrbbda63fesn7zw4di

FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing [chapter]

Carsten Weinhold, Adam Lackorzynski, Jan Bierbaum, Martin Küttler, Maksym Planeta, Hannes Weisbach, Matthias Hille, Hermann Härtig, Alexander Margolin, Dror Sharf, Ely Levy, Pavel Gak (+7 others)
2020 Lecture Notes in Computational Science and Engineering  
It further includes global, distributed platform management and system-level optimization services that transparently minimize checkpoint/restart overhead for applications.  ...  First, we published on efficient collective operations in the presence of failures [27, 31, 43] . Second, we continued research on scalable checkpointing, where we concentrated on global coordination  ...  The authors acknowledge the Jülich Supercomputing Centre, the Gauss Centre for Supercomputing, the John von Neumann Institute for Computing, and the Swiss National Supercomputing Centre (CSCS) for providing  ... 
doi:10.1007/978-3-030-47956-5_16 fatcat:accs6fosezfuzme75yc63nrkum
« Previous Showing results 1 — 15 out of 691 results