81 Hits in 2.5 sec

Investigating Applications on the A64FX [article]

Adrian Jackson, Michèle Weiland, Nick Brown, Andrew Turner, Mark Parsons
2020 arXiv   pre-print
We investigate the performance of complex scientific applications across multiple nodes, as well as single node and mini-kernel benchmarks.  ...  However, this is not true for all the benchmarks we have undertaken. Furthermore, the specific configuration of applications can have an impact on the runtime and performance experienced.  ...  applications in the UK.  ... 
arXiv:2009.11806v1 fatcat:zv2oquhjjngovjwicm3zniqqoi

Modern server ARM processors for supercomputers: A64FX and others. Initial data of benchmarks

Mikhail Borisovich Kuzminsky
2022 Program systems theory and applications  
The HPC performance review focuses primarily on benchmarks and applications for the A64FX, which supports longer vectors than other ARM processors and has higher peak performance.  ...  A comparative analysis of the performance of ARM server processors used on supercomputers or also aimed at high-performance computing (HPC) is given.  ...  In [97] on the A64FX/1.8 GHz, performance was investigated on an improved sorting algorithm using vectorization with work on large and small arrays.  ... 
doi:10.25209/2079-3316-2022-13-1-131-194 fatcat:fr4ypewxnfgb5h2jtvuacxhhuq

A64FX – Your Compiler You Must Decide! [article]

Jens Domke
2021 arXiv   pre-print
Our measurements show that orders of magnitudes in performance can be gained by deviating from the recommended usage model of the A64FX compute nodes.  ...  The current number one of the TOP500 list, Supercomputer Fugaku, has demonstrated that CPU-only HPC systems aren't dead and CPUs can be used for more than just being the host controller for a discrete  ...  software installation and application debugging.  ... 
arXiv:2107.07157v2 fatcat:lopyleh3lffl3jq5qm65va6wsu

At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache [article]

Jens Domke, Emil Vatai, Balazs Gerofi, Yuetsu Kodama, Mohamed Wahib, Artur Podobas, Sparsh Mittal, Miquel Pericàs, Lingqi Zhang, Peng Chen, Aleksandr Drozd, Satoshi Matsuoka
2022 arXiv   pre-print
HPC applications, on a per-chip basis.  ...  We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM.  ...  ; and (iv) we find that half (29 out of 52) of the simulated applications experience a ≥ 2x speedup on LARC's Core Memory Group (CMG) compared to our A64FX CMG baseline at one-fourth of the area.  ... 
arXiv:2204.02235v1 fatcat:fnr2lc264bdwhhb6o6t3dt24xa

Message from EA-HPC Workshop Organizers

2021 2021 IEEE International Conference on Cluster Computing (CLUSTER)  
This year's program includes benchmarking evaluations, experiences with new A64FX systems, compiler studies, and investigations of mapping large-scale scientific applications to new A64FX clusters.  ...  As with last year, the arrival of the Fujitsu A64FX processor and the Fugaku supercomputer have provided a new target and new opportunities for Arm-based HPC research.  ...  part of the Arm HPC User Group.  ... 
doi:10.1109/cluster48925.2021.00016 fatcat:q2vr2mxbendhhpup2mbrd4l4vi

Ookami - The First Year of a Computing Technology Testbed

Eva Siegmann, Robert Harrison
2021 Zenodo  
One of the key features of the Fujitsu A64FX processor is SVE (scalable vector extension).  ...  There are several toolchains (GNU, Arm, Cray, Fujitsu), flavors of MPI, debuggers, and profiling tools available on the system, allowing users to investigate the performance of their applications in detail  ...  by NSF ● Available for researchers worldwide (excluding ITAR prohibited countries & restricted parties on the EAR entity list) ● Usage is free for non-commercial and limited commercial purposes Fugaku  ... 
doi:10.5281/zenodo.5796325 fatcat:sashpb5acfebxnbuqcng62y2iu

Co-design and System for the Supercomputer Fugaku

Mitsuhisa Sato, Yuetsu Kodama, Miwako Tsuji, Tetsuya Odajima
2021 IEEE Micro  
We have designed an original manycore processor based on Armv8 instruction sets with the Scalable Vector Extension, A64FX processor, with Fujitsu, our industry partner.  ...  While "Fugaku" was ranked first for several benchmarks such as TOP500, HPCG, HPL-AI, and Graph500 in 2020, the major design concept is the application-first concept by the co-design for power efficiency  ...  The results of other UK benchmarks were reported in [10] . Several open-source scientific applications are ported and evaluated on A64FX.  ... 
doi:10.1109/mm.2021.3136882 fatcat:fko4hlv4vjhttd2vtq4vvldgia

A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation [article]

Joseph Huber, Weile Wei, Giorgis Georgakoudis, Johannes Doerfert, Oscar Hernandez
2021 arXiv   pre-print
This paper presents a methodology for using LLVM-based tools to tune the DCA++ (dynamical clusterapproximation) application that targets the new ARM A64FX processor.  ...  By applying these code changes, codespeed was increased by 1.98X and 78 GFlops were achieved on the A64FX processor.  ...  Acknowledgment The authors would like to thank Manuel Arenaz (Appentra Solutions), Hartmut Kaiser (Louisiana State University), and Kevin Huck (University of Oregon) for their guidance and feedback on  ... 
arXiv:2106.14332v1 fatcat:wqvga45fpnbbpisvmwnqbhl3te

Productivity meets Performance: Julia on A64FX [article]

Mosè Giordano, Milan Klöwer, Valentin Churavy
2022 arXiv   pre-print
The goal of this paper is to explore performance of the Julia programming language on the A64FX processor, with a particular focus on reduced precision.  ...  Additionally, we investigate Message Passing Interface (MPI) scalability and throughput analysis on Fugaku showing next to no significant overheads of Julia of its MPI interface.  ...  The ShallowWaters.jl simulations were run on the Isambard UK National Tier-  ... 
arXiv:2207.12762v1 fatcat:nxww5xt27bbt3lclqyyi6houcy

ECM modeling and performance tuning of SpMV and Lattice QCD on A64FX [article]

Christie Alappat and Nils Meyer and Jan Laukemann and Thomas Gruber and Georg Hager and Gerhard Wellein and Tilo Wettig
2021 arXiv   pre-print
The A64FX CPU is arguably the most powerful Arm-based processor design to date.  ...  For SpMV we show why the CRS matrix storage format is not a good practical choice on this architecture and how the SELL-C-sigma format can achieve bandwidth saturation.  ...  [35] investigated stencil codes, proxy applications, SpMV and memory-bound fluid solvers on several Arm-based platforms including A64FX but did not provide detailed and validated performance models.  ... 
arXiv:2103.03013v2 fatcat:654bqrqianci7n3vtjmzd2pz7q

Execution‐Cache‐Memory modeling and performance tuning of sparse matrix‐vector multiplication and Lattice quantum chromodynamics on A64FX

Christie Alappat, Nils Meyer, Jan Laukemann, Thomas Gruber, Georg Hager, Gerhard Wellein, Tilo Wettig
2021 Concurrency and Computation  
The A64FX CPU is arguably the most powerful Arm-based processor design to date.  ...  For SpMV we show why the compressed row storage (CRS) matrix storage format is not a good practical choice on this architecture and how the SELL-C-𝜎 format can achieve bandwidth saturation.  ...  This work was supported in part by KONWIHR, by DFG in the framework of SFB/TRR 55 and by MEXT as "Program for Promoting Researches on the Supercomputer Fugaku" (Simulation for basic science: from fundamental  ... 
doi:10.1002/cpe.6512 fatcat:ffa5hgsa2fbrrksqgd6o2lavti

Performance Evaluation of ParalleX Execution model on Arm-based Platforms [article]

Nikunj Gupta, Rohit Ashiwal, Bine Brank, Sateesh K. Peddoju, Dirk Pleiter
2020 arXiv   pre-print
In this paper, we port an Asynchronous Many-Task runtime system based on the ParalleX model, i.e., High Performance ParalleX (HPX), and evaluate it on the Arm ecosystem with a suite of benchmarks.  ...  We present the performance results on a variety of Arm processors and compare it with their x86 brethren from Intel.  ...  We report application execution time over kernel performance to investigate how the complete application scales on a distributed setting.  ... 
arXiv:2010.12195v1 fatcat:uofh3nlz7fg43fsiwexjdji6ci

Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX [article]

Christie L. Alappat, Jan Laukemann, Thomas Gruber, Georg Hager, Gerhard Wellein, Nils Meyer, Tilo Wettig
2020 arXiv   pre-print
The A64FX CPU powers the current number one supercomputer on the Top500 list.  ...  Using these features, we construct the Execution-Cache-Memory (ECM) performance model for the A64FX processor in the FX700 supercomputer and validate it using streaming loops.  ...  One goal of our work is to investigate the reasons for this failure and how to mitigate it.  ... 
arXiv:2009.13903v1 fatcat:f5iikrcor5aurlwjpe4d74e3xy

mpiQulacs: A Distributed Quantum Computer Simulator for A64FX-based Cluster Systems [article]

Satoshi Imamura, Masafumi Yamazaki, Takumi Honda, Akihiko Kasagi, Akihiro Tabuchi, Hiroshi Nakao, Naoto Fukumoto, Kohta Nakashima
2022 arXiv   pre-print
A64FX is an ARM-based CPU that is also equipped in the world's top Fugaku supercomputer.  ...  Quantum computer simulators running on classical computers are essential for developing real quantum computers and emerging quantum applications.  ...  ACKNOWLEDGEMENT The authors thank all of our colleagues who constructed the Todoroki cluster system. Their great effort helped us obtain the remarkable evaluation results of mpiQulacs.  ... 
arXiv:2203.16044v1 fatcat:phjrer7vcjgfxh2gruejkcu3ha

Grid on QPACE 4 [article]

Peter Georg, Nils Meyer, Stefan Solbrig, Tilo Wettig
2021 arXiv   pre-print
In 2020 we deployed QPACE 4, which features 64 Fujitsu A64FX model FX700 processors interconnected by InfiniBand EDR. QPACE 4 runs an open-source software stack.  ...  We also present the benefits of an alternative data layout of complex numbers for the Domain Wall operator.  ...  Acknowledgment We acknowledge funding of the QPACE 4 project provided by the Deutsche Forschungsgemeinschaft (DFG) in the framework of SFB/TRR-55.  ... 
arXiv:2112.01852v1 fatcat:3xglmixivjgd3doxzhqrjdoqbu
« Previous Showing results 1 — 15 out of 81 results