21 Hits in 5.2 sec

Building a fault tolerant application using the GASPI communication layer [article]

Faisal Shahzad, Moritz Kreutzer, Thomas Zeiser, Rui Machado, Andreas Pieper, Georg Hager, Gerhard Wellein
2015 arXiv   pre-print
In our work we use GASPI, which is a relatively new communication library based on the PGAS model. It provides the missing features to allow the design of fault-tolerant applications.  ...  Instead of introducing algorithm-based fault tolerance in its true sense, we demonstrate how we can build on (existing) clever checkpointing and extend applications to allow integrate a low cost fault  ...  In this work, we use the GPI-2 [10] implementation of the GASPI specification [11] to build a fault tolerant application that is capable of recovering dynamically from process failures.  ... 
arXiv:1505.04628v1 fatcat:iho7lw33y5aa7o74ztiwdzoeye

Optimization of Computationally and I/O Intense Patterns in Electronic Structure and Machine Learning Algorithms

Michal Pitonak, Marian Gall, Adrian Rodriguez-Bazaga, Valeria Bartsch
2019 Zenodo  
fault tolerant mode.  ...  Development of scalable High-Performance Computing (HPC) applications is already a challenging task even in the pre-Exascale era.  ...  Acknowledgements This work was financially supported by the PRACE project funded in part by the EUs Horizon 2020 Re-  ... 
doi:10.5281/zenodo.2807937 fatcat:5szkqofx3bcqpaypsnrqnjtbue

Performance Evaluation of an Algorithm-based Asynchronous Checkpoint-Restart Fault Tolerant Application Using Mixed MPI/GPI-2 [article]

Adrian Bazaga, Michal Pitonak
2018 arXiv   pre-print
with Fault Tolerance features.  ...  C libraries that create an interface to encapsulate this file system functionalities, and using the GPI-2 implementation for the GASPI protocol and it's in-memory checkpointing library to provide an application  ...  In [12] it's presented how to build fault tolerant applications by using the FT-MPI with a coding approach.  ... 
arXiv:1804.11312v3 fatcat:gnnx5rsrq5al3idi3rrghs3soy


Hao Li, Asim Kadav, Erik Kruus, Cristian Ungureanu
2015 Proceedings of the Tenth European Conference on Computer Systems - EuroSys '15  
In our results, we show MALT provides fault tolerance, network efficiency and speedup to these applications.  ...  Building such models on a single machine is often impractical because of the large amount of computation required.  ...  Acknowledgments We would like to thank our shepherd Derek Murray and the anonymous reviewers for the useful feedback.  ... 
doi:10.1145/2741948.2741965 dblp:conf/eurosys/LiKKU15 fatcat:vczbxlmkm5gtdlp6gisf5ingby

D7.3: Evaluation of Tools and Techniques for Future Exascale Systems

Venkatesh Kannan, Nawar Akhras
2019 Zenodo  
The application codes were selected with a focus on the European scientific and engineering research communities by working with European Centres of Excellence (CoEs).  ...  PRACE-3IP WP7 as reported in D7.2.1 'A Report on the Survey of HPC Tools and Techniques'.  ...  To address this, we have explored GASPI, which enables to switch from traditional synchronous, two-sided MPI communication to one-sided, asynchronous and fault-tolerant execution using RDMA (Remote Data  ... 
doi:10.5281/zenodo.6805996 fatcat:ubpjbapmc5gb7dlpq6eibaignm

Towards an Exascale Enabled Sparse Solver Repository [chapter]

Jonas Thies, Martin Galgon, Faisal Shahzad, Andreas Alvermann, Moritz Kreutzer, Andreas Pieper, Melven Röhrig-Zöllner, Achim Basermann, Holger Fehske, Georg Hager, Bruno Lang, Gerhard Wellein
2016 Lecture Notes in Computational Science and Engineering  
Key features of the ESSR include holistic performance engineering, tight integration between software layers and mechanisms to mitigate hardware failures.  ...  We discuss the development of a new 'Exascale enabled' sparse solver repository (the ESSR) that addresses these challenges-from fundamental design considerations and development processes to actual implementations  ...  This work was supported by the German Research Foundation (DFG) through the Priority Programs 1648 "Software for Exascale Computing" under project ES-SEX.  ... 
doi:10.1007/978-3-319-40528-5_13 fatcat:jancdp27w5hktf5y6utn533zwi

DASH: Data Structures and Algorithms with Support for Hierarchical Locality [chapter]

Karl Fürlinger, Colin Glass, Jose Gracia, Andreas Knüpfer, Jie Tao, Denis Hünich, Kamran Idrees, Matthias Maiterth, Yousri Mhedheb, Huan Zhou
2014 Lecture Notes in Computer Science  
The DASH library is implemented on top of our runtime system DART, which provides an abstraction layer on top of existing one-sided communication substrates.  ...  Operator overloading is used to provide global-view PGAS semantics without the need for a custom PGAS (pre-)compiler.  ...  GASPI [12] is an effort to standardize an API for PGAS programming developed by Fraunhofer, it features support for fault tolerance, by supporting timeouts for all non-local operations.  ... 
doi:10.1007/978-3-319-14313-2_46 fatcat:mt4xioywinccxkckb3rvnglls4

ESSEX: Equipping Sparse Solvers for Exascale [chapter]

Andreas Alvermann, Achim Basermann, Holger Fehske, Martin Galgon, Georg Hager, Moritz Kreutzer, Lukas Krämer, Bruno Lang, Andreas Pieper, Melven Röhrig-Zöllner, Faisal Shahzad, Jonas Thies (+1 others)
2014 Lecture Notes in Computer Science  
The project pursues a coherent co-design of all software layers where a holistic performance engineering process guides code development across the classic boundaries of application, numerical method,  ...  Within ESSEX the numerical methods cover widely applicable solvers such as classic Krylov, Jacobi-Davidson, or the recent FEAST methods, as well as domain-specific iterative schemes relevant for the ESSEX  ...  This work is supported by the German Research Foundation (DFG) through the Priority Programme 1648 "Software for Exascale Computing" (SPPEXA) under project ESSEX.  ... 
doi:10.1007/978-3-319-14313-2_49 fatcat:57wevi4tuzaejnisisllx4d6be

A view of programming scalable data analysis: from clouds to exascale

Domenico Talia
2019 Journal of Cloud Computing: Advances, Systems and Applications  
in the near future Exascale systems will be used to implement extreme-scale data analysis.  ...  Scalability is a key feature for big data analysis and machine learning frameworks and for applications that need to analyze very large and real-time data available from data repositories, social media  ...  Availability of data and materials Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.  ... 
doi:10.1186/s13677-019-0127-x fatcat:l5mimqzwibh7fn4fedlsz4jkji

Paving the Way Towards a Highly Energy-Efficient and Highly Integrated Compute Node for the Exascale Revolution: The ExaNoDe Approach

Alvise Rigo, Christian Pinto, Kevin Pouget, Daniel Raho, Denis Dutoit, Pierre-Yves Martinez, Chris Doran, Luca Benini, Iakovos Mavroidis, Manolis Marazakis, Valeria Bartsch, Guy Lonsdale (+8 others)
2017 2017 Euromicro Conference on Digital System Design (DSD)  
Power consumption and high compute density are the key factors to be considered when building a compute node for the upcoming Exascale revolution.  ...  , heterogeneous co-processors and using advanced hardware integration technologies with the novel UNIMEM Global Address Space memory system.  ...  The work presented in this paper reflects only authors' view and the European Commission is not responsible for any use that may be made of the information it contains.  ... 
doi:10.1109/dsd.2017.37 dblp:conf/dsd/RigoPPRDMDBMMBL17 fatcat:jumekx7n6vcmvnu32epz3tp6py

Challenges and Opportunities of User-Level File Systems for HPC (Dagstuhl Seminar 17202)

André Brinkmann, Kathryn Mohror, Weikuan Yu, Marc Herbstritt
2017 Dagstuhl Reports  
We had a lively week of learning about each other's approaches as well as unique I/O use cases that can influence the design of a community-driven file and storage system standards.  ...  This is because it is relatively easy to swap in new, specialized user-level file systems for use by applications on a case-by-case basis, as opposed to the current mainstream approach of using general-purpose  ...  GPI provides a standardized API (GASPI) and hides latencies by asynchronous one sided RDMA communication.  ... 
doi:10.4230/dagrep.7.5.97 dblp:journals/dagstuhl-reports/BrinkmannMY17 fatcat:2bquax3oz5c5xlsoxdkiomotiy

D2.5: Updated Stakeholder Management in PRACE 2

Philippe Segers, Jean-Philippe Nominé
2017 Zenodo  
, applications.  ...  A first analysis has been documented within a previous deliverable issued in February 2017, but since then major initiatives have moved forward, that will reshape the European HPC ecosystem: the main ones  ...  The project approach is based on generic multiscale computing patterns which allow the implementation of customized algorithms to optimise load balancing, data handling, fault tolerance and energy consumption  ... 
doi:10.5281/zenodo.6801678 fatcat:nvhiqdnafjbo7be5det3kphezi

D2.4: Stakeholder Management in PRACE 2.0

Jean-Philippe Nominé, Philippe Segers
2017 Zenodo  
High Performance Computing (HPC) – infrastructures, technologies, applications.  ...  The recent launching of a Digital Single Market (DSM) strategy for Europe, announced by the EC in April 2016, provides a new horizon with an extended and strengthened need and interest for HPC.  ...  communities in Europe, working across all layers of the system stack and at the same time, fuelling new industries in HPC.  ... 
doi:10.5281/zenodo.6801676 fatcat:5z7huwaa5vg57go37wtxbkz6iu

Parallel Asynchronous Matrix Multiplication for a Distributed Pipelined Neural Network

Anke Mareike Schmidtobreick
The method used for its training thereby relies on the stochastical gradient descent method and is combined with a block-wise distribution of the network layers to groups of processes, as well as a pipelining  ...  A description of the matrix multiplication algorithm based on a PGAS-model with one-sided and asynchronous communication, written in C and GASPI, is presented.  ...  Additionally the fault tolerance of GASPI may provide better futures prospects especially regarding the application on large clusters.  ... 
doi:10.11588/heidok.00023737 fatcat:wwsimf7llrhnlkasfere4nbm5m

D5.1: Market and Technology Watch Report Year 1

Jean-Philippe Nominé
2016 Zenodo  
It is thus the continuation of a well-established effort, using assessment of the HPC market based on market surveys, supercomputing conferences, and exchanges with vendors and between experts involved  ...  This deliverable is the first one of PRACE-4IP Work Package 5 Task 1, it corresponds to a periodic annual update on technology and market trends.  ...  The deliverables also include a sustainable set of methods and tools for cross-cutting issues such as scheduling, auto-tuning, and algorithm-based fault tolerance packaged into opensource library modules  ... 
doi:10.5281/zenodo.6801690 fatcat:zpnjoenqkvb2te74rvci326vba
« Previous Showing results 1 — 15 out of 21 results