Filters








16 Hits in 3.4 sec

An optimized D2Q37 Lattice Boltzmann code on GP-GPUs

Luca Biferale, Filippo Mantovani, Marcello Pivanti, Fabio Pozzati, Mauro Sbragaglia, Andrea Scagliarini, Sebastiano Fabio Schifano, Federico Toschi, Raffaele Tripiccione
2013 Computers & Fluids  
We describe the implementation of a thermal compressible Lattice Boltzmann algorithm on an NVIDIA Tesla C2050 system based on the Fermi GP-GPU.  ...  We describe the overall organization of the algorithm and give details on its implementations. Efficiency ranges from 25% to 31% of the double precision peak performance of the GP-GPU.  ...  The support of Wilhelm Homberg and Jochen Kreutz is gratefully acknowledged.  ... 
doi:10.1016/j.compfluid.2012.06.003 fatcat:xpsp3l5nunc57b6ue727q2qm7m

A Portable OpenCL Lattice Boltzmann Code for Multi- and Many-core Processor Architectures

Enrico Calore, Sebastiano Fabio Schifano, Raffaele Tripiccione
2014 Procedia Computer Science  
In this paper we address precisely this issue, using as test-bench a Lattice Boltzmann code implemented in OpenCL.  ...  We analyze its performance on several different state-of-the-art processors: NVIDIA GPUs and Intel Xeon-Phi many-core accelerators, as well as more traditional Ivy Bridge and Opteron multi-core commodity  ...  Acknowledgements This work was done in the framework of the COKA and Suma projects, supported by INFN.  ... 
doi:10.1016/j.procs.2014.05.004 fatcat:exrfaxm7nnhjdlectgdoop5cci

Early Experience on Porting and Running a Lattice Boltzmann Code on the Xeon-phi Co-Processor

G. Crimi, F. Mantovani, M. Pivanti, S.F. Schifano, R. Tripiccione
2013 Procedia Computer Science  
with the results obtained by previous implementations developed on state-of-the-art classic multi-core CPUs and GP-GPUs.  ...  In this paper we report on our early experience on porting, optimizing and benchmarking a Lattice Boltzmann (LB) code on the Xeon-Phi co-processor, the first generally available version of the new Many  ...  Acknowledgements This work was performed in the framework of the COKA and Suma projects, supported by Istituto Nazionale di Fisica Nucleare (INFN).  ... 
doi:10.1016/j.procs.2013.05.219 fatcat:oi23xqmynfgf5c2fbtr3pw364i

Early Experience on Using Knights Landing Processors for Lattice Boltzmann Applications [chapter]

Enrico Calore, Alessandro Gabbana, Sebastiano Fabio Schifano, Raffaele Tripiccione
2018 Lecture Notes in Computer Science  
We assess the performance of this processor for Lattice Boltzmann codes, widely used in computational fluid-dynamics.  ...  In our OpenMP code we consider several memory data-layouts that meet the conflicting computing requirements of distinct parts of the application, and sustain a large fraction of peak performance.  ...  This work was done in the framework of the COKA, COSA and SUMA projects of INFN. We would like to thank CINECA (Italy) for access to their HPC systems.  ... 
doi:10.1007/978-3-319-78024-5_45 fatcat:mih3anqr7zgxnhariki7iw4bjm

Optimization of lattice Boltzmann simulations on heterogeneous computers

E Calore, A Gabbana, SF Schifano, R Tripiccione
2017 The international journal of high performance computing applications  
In this paper we consider exactly this problem for a class of applications based on Lattice Boltzmann Methods, widely used in computational fluid-dynamics.  ...  We test the performance of our codes and their scaling properties using as testbeds HPC clusters incorporating different accelerators: Intel Xeon-Phi many-core processors, NVIDIA GPUs and AMD GPUs.  ...  and Suma projects of INFN.  ... 
doi:10.1177/1094342017703771 fatcat:ml4n5ulsk5hmnjeq6fbvm2yyfy

Performance and portability of accelerated lattice Boltzmann applications with OpenACC

Enrico Calore, Alessandro Gabbana, Jiri Kraus, Sebastiano Fabio Schifano, Raffaele Tripiccione
2016 Concurrency and Computation  
We then benchmark the code on a variety of processors, including traditional CPUs and GPUs, and make accurate performance comparisons with other GPU implementations of the same algorithm using CUDA and  ...  In this paper we address precisely this issue, using as a test-bench a massively parallel Lattice Boltzmann algorithm.  ...  This work was done in the framework of the COKA, COSA and Suma projects of INFN.  ... 
doi:10.1002/cpe.3862 fatcat:r5t72w47j5elfarytq64hfua7e

Massively parallel lattice–Boltzmann codes on large GPU clusters

E. Calore, A. Gabbana, J. Kraus, E. Pellegrini, S.F. Schifano, R. Tripiccione
2016 Parallel Computing  
This paper describes a massively parallel code for a state-of-the art thermal lattice- Boltzmann method.  ...  Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs.  ...  GPUs.  ... 
doi:10.1016/j.parco.2016.08.005 fatcat:uvnhu5jhrba3bo324dyeahi6nm

Performance and Power Analysis of HPC Workloads on Heterogenous Multi-Node Clusters

Filippo Mantovani, Enrico Calore
2018 Journal of Low Power Electronics and Applications  
The former cluster embeds Intel Haswell CPUs and NVIDIA K80 GPUs, while the latter is made up of NVIDIA Jetson TX1 boards, each hosting an Arm Cortex-A57 CPU and an NVIDIA Tegra X1 Maxwell GPU.  ...  For this reason, we present a performance and energy-efficiency study aimed at demonstrating how a single tool can be used to collect most of the relevant metrics.  ...  We warmly thank the BSC tools group, supporting us for the smooth integration and test of our setup within Extrae and Paraver. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/jlpea8020013 fatcat:etpm56cutzhuhfhbtjprmhv5bu

A Holistic Scalable Implementation Approach of the Lattice Boltzmann Method for CPU/GPU Heterogeneous Clusters

Christoph Riesinger, Arash Bakhtiari, Martin Schreiber, Philipp Neumann, Hans-Joachim Bungartz
2017 Computation  
We utilize the lattice Boltzmann method for fluid flow as a representative of a scientific computing application and develop a holistic implementation for large-scale CPU/GPU heterogeneous clusters.  ...  Eventually, we come up with an implementation using all the available computational resources for the lattice Boltzmann method operators.  ...  Furthermore, OpenACC is a noticeable candidate when it comes to the development of portable code for multi-core CPUs and GPUs.  ... 
doi:10.3390/computation5040048 fatcat:aew7bpt7gbaafay2g6ay266gba

Software and DVFS Tuning for Performance and Energy-Efficiency on Intel KNL Processors

Enrico Calore, Alessandro Gabbana, Sebastiano Schifano, Raffaele Tripiccione
2018 Journal of Low Power Electronics and Applications  
As a benchmark application we use a Lattice Boltzmann code heavily optimized for this architecture, and implemented using several different arrangements of the application data in memory (data-layouts,  ...  Energy consumption of processors and memories is quickly becoming a limiting factor in the deployment of large computing systems.  ...  the design and implementation of the different data-structures and to the analysis of computing performances; and R.T. has designed and developed the Lattice Boltzmann algorithm.  ... 
doi:10.3390/jlpea8020018 fatcat:h6wevknrebfwxaqvcwrbywagja

Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications

Enrico Calore, Alessandro Gabbana, Sebastiano Fabio Schifano, Raffaele Tripiccione
2017 Concurrency and Computation  
We finally estimate the benefits obtainable running the full code on a HPC multi-GPU node, with respect to default clock frequency governors.  ...  We instrument our code to accurately monitor power consumption and execution time without the need of any additional hardware, and we enable it to change CPUs and GPUs clock frequencies while running.  ...  Acknowledgements This work was done in the framework of the COKA, COSA and Suma projects of INFN. We would like to thank all developers of the PAPI library (and especially V. M.  ... 
doi:10.1002/cpe.4143 fatcat:qs7g3m2jafgrzkhmaqfc4ioafy

Optimization of Multi-Phase Compressible Lattice Boltzmann Codes on Massively Parallel Multi-Core Systems

Luca Biferale, Filippo Mantovani, Marcello Pivanti, Fabio Pozzati, Mauro Sbragaglia, Andrea Scagliarini, Sebastiano Fabio Schifano, Federico Toschi, Raffaele Tripiccione
2011 Procedia Computer Science  
We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for massively parallel systems based on multi-core processors.  ...  We obtain a sustained performance for this ready-for-physics code that is a large fraction of peak.  ...  Acknowledgments: We would like to warmly thank all members of the AuroraScience team for their efforts in bringing the AuroraScience machine on-line and making it available for our tests.  ... 
doi:10.1016/j.procs.2011.04.105 fatcat:vklvfxd6hbelln5fwibxgrfzqe

Performance and Energy Assessment of a Lattice Boltzmann Method Based Application on the Skylake Processor

Ivan Girotto, Sebastiano Fabio Schifano, Enrico Calore, Gianluca Di Staso, Federico Toschi
2020 Computation  
This paper presents the performance analysis for both the computing performance and the energy efficiency of a Lattice Boltzmann Method (LBM) based application, used to simulate three-dimensional multicomponent  ...  We analyse the measured performances of the implemented data layouts on the Skylake processor while scaling the number of threads per socket.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/computation8020044 fatcat:zym6ledhnbhoffp2snusmzpufe

Performance Evaluation of Scientific Applications on POWER8 [chapter]

Andrew V. Adinetz, Paul F. Baumeister, Hans Böttiger, Thorsten Hater, Thilo Maurer, Dirk Pleiter, Wolfram Schenck, Sebastiano Fabio Schifano
2015 Lecture Notes in Computer Science  
This architecture features a moderate number of cores, each of which expose a high amount of instruction-level as well as threadlevel parallelism.  ...  With POWER8 a new generation of POWER processors became available.  ...  Lattice Boltzmann Performance Results The Lattice Boltzmann (LB) method is widely used in computational fluid dynamics, to numerically solve the equation of motion of flows in two and three dimensions.  ... 
doi:10.1007/978-3-319-17248-4_2 fatcat:upzjxnqi4vaudcog7w2pxrpqtu

Energy-efficiency evaluation of Intel KNL for HPC workloads [article]

E. Calore, A. Gabbana, S.F. Schifano, R. Tripiccione
2018 pre-print
As a benchmark application we use a Lattice Boltzmann code heavily optimized for this architecture and implemented using different memory data layouts to store its lattice.  ...  Energy consumption is increasingly becoming a limiting factor to the design of faster large-scale parallel systems, and development of energy-efficient and energy-aware applications is today a relevant  ...  Acknowledgements This work was done in the framework of the COKA, COSA projects of INFN, and the PRIN2015 project of MIUR. We would like to thank CINECA (Italy) for access to their HPC systems.  ... 
doi:10.3233/978-1-61499-843-3-733 arXiv:1804.01911v1 fatcat:ztum5bxvj5cmdeulb4wktxwsq4
« Previous Showing results 1 — 15 out of 16 results