Filters








32 Hits in 4.1 sec

Modeling and optimization of non-blocking checkpointing for optimistic simulation on myrinet clusters

Francesco Quaglia, Andrea Santoro
2003 Proceedings of the 17th annual international conference on Supercomputing - ICS '03  
Checkpointing and Communication Library (CCL) is a recently developed software implementing CPU offloaded checkpointing functionalities in support of optimistic parallel simulation on myrinet clusters.  ...  In this paper we present a cost model for non-blocking checkpointing and derive a performance effective re-synchronization semantic which we call minimum cost re-synchronization (MC).  ...  SUMMARY In this paper we have presented a performance model for non-blocking, i.e. DMA based, checkpointing in support of optimistic parallel discrete event simulation on myrinet clusters.  ... 
doi:10.1145/782814.782834 dblp:conf/ics/QuagliaS03 fatcat:c46jkvxxfrcijkbwxb4wfcutzm

Modeling and optimization of non-blocking checkpointing for optimistic simulation on myrinet clusters

Francesco Quaglia, Andrea Santoro
2005 Journal of Parallel and Distributed Computing  
Checkpointing and Communication Library (CCL) is a recently developed software implementing CPU offloaded checkpointing functionalities in support of optimistic parallel simulation on myrinet clusters.  ...  In this paper we present a cost model for non-blocking checkpointing and derive a performance effective re-synchronization semantic which we call minimum cost re-synchronization (MC).  ...  SUMMARY In this paper we have presented a performance model for non-blocking, i.e. DMA based, checkpointing in support of optimistic parallel discrete event simulation on myrinet clusters.  ... 
doi:10.1016/j.jpdc.2005.02.006 fatcat:kpnacp3muvclzb7nfnnu76xhca

Modeling and optimization of non-blocking checkpointing for optimistic simulation on myrinet clusters

Francesco Quaglia, Andrea Santoro
2003 Proceedings of the 17th annual international conference on Supercomputing - ICS '03  
Checkpointing and Communication Library (CCL) is a recently developed software implementing CPU offloaded checkpointing functionalities in support of optimistic parallel simulation on myrinet clusters.  ...  In this paper we present a cost model for non-blocking checkpointing and derive a performance effective re-synchronization semantic which we call minimum cost re-synchronization (MC).  ...  SUMMARY In this paper we have presented a performance model for non-blocking, i.e. DMA based, checkpointing in support of optimistic parallel discrete event simulation on myrinet clusters.  ... 
doi:10.1145/782832.782834 fatcat:5uzhq3oo7vge5geggf6wgvfvlm

Performance evaluation of automatic checkpoint-based fault tolerance for AMPI and Charm++

Gengbin Zheng, Chao Huang, Laxmikant V. Kalé
2006 ACM SIGOPS Operating Systems Review  
As the size of high performance clusters multiplies, the probability of system failure grows substantially, posing an increasingly significant challenge for scalability.  ...  However, the application developer is required to write significant additional code for checkpointing and restarting.  ...  Acknowledgements This work was supported in part by DOE (Grant B341494 and B505214), the National Science Foundation (NGS 0103645 and ITR 0205611).  ... 
doi:10.1145/1131322.1131340 fatcat:ryv4mhqvqjejhjuzmffvuzj2dy

Synchronization methods in parallel and distributed discrete-event simulation

Shafagh Jafer, Qi Liu, Gabriel Wainer
2013 Simulation modelling practice and theory  
Jafer et al. / Simulation Modelling Practice and Theory 30 (2013) 54-73  ...  for the parallel discrete-event simulations.  ...  In the same vein, two checkpointing mechanisms are defined, including a clustered checkpointing mechanism that saves states for the LPs only when remote events are received from other clusters, and a local  ... 
doi:10.1016/j.simpat.2012.08.003 fatcat:azaynxjyingybamdj2whgo5bme

Noncontiguous locking techniques for parallel file systems

Avery Ching, Wei-keng Liao, Alok Choudhary, Robert Ross, Lee Ward
2007 Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07  
Current atomic I/O solutions are not optimized for handling noncontiguous access patterns because current locking systems have a fixed file system block-based granularity and do not leverage highlevel  ...  In this paper we present a hybrid lock protocol that takes advantage of new list and datatype byte-range lock description techniques to enable high performance atomic I/O operations for these challenging  ...  Frank Mueller, for his guidance. This work was supported in part by DOE's  ... 
doi:10.1145/1362622.1362658 dblp:conf/sc/ChingLCRW07 fatcat:6ccnageaffgexi5zsi7pvcnek4

MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI

A. Bouteiller, T. Herault, G. Krawezik, P. Lemarinier, F. Cappello
2006 The international journal of high performance computing applications  
We measure the performance of these protocols on a microbenchmark and compare them for the NAS benchmark, using an original fault tolerance test.  ...  MPI is one of the most used message passing library in HPC applications. These two trends raise the need for fault tolerant MPI.  ...  Other performance evaluations concerning the different components (Channel Memory, Checkpoint Servers, etc.) and impact of blocking and non blocking checkpoint on the execution time are presented in our  ... 
doi:10.1177/1094342006067469 fatcat:6fmggl6mz5djhjpzgvvj3ezc5i

Space uncertain simulation events

Francesco Quaglia, Roberto Beraldi
2004 Parallel and Distributed Simulation (PADS), Proceedings of the Workshop on  
Also, experimental results for optimistic simulations of a Personal Communication System (PCS) modeled with space uncertain events are reported.  ...  In this paper we propose the concept of "spatial uncertainty" expressed as the possibility for a simulation event to occur in one of a set of points within the simulated system space.  ...  Testing Environment The experiments were all performed on a myrinet cluster of 4 Pentium III 866 MHz (256 Mbytes RAM).  ... 
doi:10.1145/1013329.1013359 fatcat:t45zjrdbsvfati426b6q77z5xq

Efficient Master/Worker Parallel Discrete Event Simulation

Alfred Park, Ric Fujimoto
2009 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation  
It has truly been an honor and a blessing to perform research under the supervision of one of the pioneers in the parallel and distributed simulation field.  ...  I am extremely grateful to have had the opportunity to study as one of his students.  ...  LIST OF FIGURES computational workloads that, in the past, were reserved for processing on dedicated clusters.  ... 
doi:10.1109/pads.2009.9 dblp:conf/pads/ParkF09 fatcat:6aiga6uc6jdy5adfzhujiqab3a

Fault Tolerance for Remote Memory Access Programming Models [article]

Maciej Besta, Torsten Hoefler
2020 arXiv   pre-print
We use this model to construct several highly-scalable mechanisms that provide efficient low-overhead in-memory checkpointing, transparent logging of remote memory accesses, and a scheme for transparent  ...  However, little work exists on resilience schemes for RMA-based applications and systems.  ...  ACKNOWLEDGEMENTS We thank the CSCS team granting access to the Monte Rosa machine, and for their excellent technical support. We thank Franck Cappello for inspiring remarks.  ... 
arXiv:2010.09025v1 fatcat:5dwamllqe5a6hpspu4uzs3f6ka

The Wisconsin Wind Tunnel project

Mark D. Hill, James R. Larus, David A. Wood
1994 SIGARCH Computer Architecture News  
on-line.  ...  This document lists contributors to the Wisconsin Wind Tunnel Project, gives a brief description of the project, and presents references and abstracts to its principal papers, including how to obtain them  ...  We conclude by speculating on the performance of optimistic simulation when simulating (1) target network details, and (2) on hosts with high message latencies and no synchronization hardware.  ... 
doi:10.1145/192537.192543 fatcat:rvtgkgeonnba3cdbociaiglrdq

Performance of machines for lattice QCD simulations [article]

Tilo Wettig
2005 arXiv   pre-print
We review the architecture of massively parallel machines used for lattice QCD simulations and present benchmarks for the performance of popular algorithms on these platforms.  ...  We cover commercial supercomputers, PC clusters, and custom-designed machines. We also speculate on future developments.  ...  Using these lean libraries and PCI-Express, typical latencies are on the order of 12 µs for Gig-E, 5 µs for Myrinet, and 3.5 µs for Infiniband.  ... 
arXiv:hep-lat/0509103v1 fatcat:s4bka4ljezezrgcxittcvp5dju

Improved message logging versus improved coordinated checkpointing for fault tolerant MPI

P. Lemarinier, A. Bouteiller, T. Herault, G. Krawezik, F. Cappello
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)  
Several protocols provide automatic and transparent fault detection and recovery for message passing systems with different impact on application performance and the capacity to tolerate a high fault rate  ...  stress of coordinated checkpoint.  ...  MPICH-V project is partially funded, through the CGP2P project, by the French ACI initiative on GRID of the ministry of research.  ... 
doi:10.1109/clustr.2004.1392609 dblp:conf/cluster/LemarinierBHKC04 fatcat:txn7eolanfcr3c46j7vjytqboy

Architectural specification for massively parallel computers: an experience and measurement-based approach

Ron Brightwell, William Camp, Benjamin Cole, Erik DeBenedictis, Robert Leland, James Tomkins, Arthur B. Maccabe
2005 Concurrency and Computation  
We contrast our approach of leveraging high-volume, mass-market commodity processors to that taken for the Earth Simulator.  ...  We present a comparison of benchmarks and application performance that support our approach. We also project the performance of Red Storm and the Earth simulator.  ...  The most achievable way of providing such connectivity is a non-blocking crossbar switch.  ... 
doi:10.1002/cpe.893 fatcat:pzr2kymiajeshassp7oiinr2oi

Memory leads the way to better computing

H.-S. Philip Wong, Sayeef Salahuddin
2015 Nature Nanotechnology  
I am honored to have been part of this study, and wish to thank the study members for their passion for the subject, and for contributing far more of their precious time than they expected. Peter M.  ...  i This page intentionally left blank. ii FOREWORD This document reflects the thoughts of a group of highly talented individuals from universities, industry, and research labs on what might be the challenges  ...  The latter is especially important with the growing interest on multi-physics simulations, in which separate models, such as an ocean and wind dynamics model in a climate simulation, are combined to model  ... 
doi:10.1038/nnano.2015.29 pmid:25740127 fatcat:d6iiuuwcozbxlgn4kxxzdzwd4m
« Previous Showing results 1 — 15 out of 32 results