Treffer: A Python/Fortran Implementation of the Lattice‐Boltzmann Kernel on Multiple GPU Using the OpenACC Framework.
Weitere Informationen
The increasing availability of GPU accelerated architectures for high‐performance computing presents new opportunities for scientific software but also challenges due to the complexity of porting legacy codes to accelerator platforms. Directive‐based programming models such as OpenACC offer a minimally intrusive pathway to exploit GPU acceleration without requiring extensive rewriting of existing codes. The current work presents a comprehensive performance and portability study of a LatticeBoltzmann Method solver (PyLB) originally written in Python, Mpi4Py, and Fortran for CPU architectures, which is ported to GPUs using OpenACC directives applied to the Fortran routines. The performance of the solver is evaluated on NVIDIA V100, A100, and H100 GPUs available on the Jean Zay supercomputer from Institute for Development and Resources in Intensive Scientific Computing (IDRIS) in France. Roofline analysis and extensive strong and weak scalability tests are conducted, showing that the GPU‐enabled version of PyLB scales efficiently across multiple GPUs. The solver achieves performance on the H100 GPU equivalent to thousands of CPU cores and shows strong energy and carbon efficiency advantages over traditional CPU‐based simulations. The implementation is validated using classical benchmarks, including the decaying Taylor‐Green vortex and the flow over a 3‐D sphere. The results confirm the physical accuracy of the GPU port while highlighting its computational and environmental advantages. [ABSTRACT FROM AUTHOR]
Copyright of Concurrency & Computation: Practice & Experience is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)