Treffer: Automated parallel execution of distributed task graphs with FPGA clusters

Title:

Automated parallel execution of distributed task graphs with FPGA clusters

Authors:

De Haro Ruiz, Juan Miguel, Álvarez Martínez, Carlos, Jiménez González, Daniel, Martorell Bofill, Xavier, Ueno, Tomohiro, Sano, Kentaro, Ringlein, Burkhard, Abel, François, Weiss, Beat

Contributors:

Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. PM - Programming Models

Publisher Information:

Elsevier

Publication Year:

2024

Collection:

Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge

Subject Terms:

Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, Supercomputers, Field programmable gate arrays, C++ (Computer program language), FPGA, MPI, Task graphs, Heterogeneous computing, High performance computing, Programming models, Distributed computing, Supercomputadors, Matrius de portes programables per l'usuari, C++ (Llenguatge de programació)

Document Type:

Fachzeitschrift article in journal/newspaper

File Description:

17 p.; application/pdf

Language:

English

Relation:

https://www.sciencedirect.com/science/article/pii/S0167739X24003418; https://hdl.handle.net/2117/411487

DOI:

10.1016/j.future.2024.06.041

Availability:

https://hdl.handle.net/2117/411487
https://doi.org/10.1016/j.future.2024.06.041

Rights:

http://creativecommons.org/licenses/by/4.0/ ; Open Access ; Attribution 4.0 International

Accession Number:

edsbas.1DBB3854

Database:

BASE

Weitere Informationen

Over the years, Field Programmable Gate Arrays (FPGA) have been gaining popularity in the High Performance Computing (HPC) field, because their reconfigurability enables very fine-grained optimizations with low energy cost. However, the different characteristics, architectures, and network topologies of the clusters have hindered the use of FPGAs at a large scale. In this work, we present an evolution of OmpSs@FPGA, a high-level task-based programming model and extension to OmpSs-2, that aims at unifying all FPGA clusters by using a message-passing interface that is compatible with FPGA accelerators. These accelerators are programmed with C/C++ pragmas, and synthesized with High-Level Synthesis tools. The new framework includes a custom protocol to exchange messages between FPGAs, agnostic of the architecture and network type. On top of that, we present a new communication paradigm called Implicit Message Passing (IMP), where the user does not need to call any message-passing API. Instead, the runtime automatically infers data movement between nodes. We test classic message passing and IMP with three benchmarks on two different FPGA clusters. One is cloudFPGA, a disaggregated platform with AMD FPGAs that are only connected to the network through UDP/TCP/IP. The other is ESSPER, composed of CPU-attached Intel FPGAs that have a private network at the ethernet level. In both cases, we demonstrate that IMP with OmpSs@FPGA can increase the productivity of FPGA programmers at a large scale thanks to simplifying communication between nodes, without limiting the scalability of applications. We implement the N-body, Heat simulation and Cholesky decomposition benchmarks, and show that FPGA clusters get 2.6x and 2.4x better performance per watt than a CPU-only supercomputer for N-body and Heat. ; This work was supported by the Horizon 2020 TEXTA-ROSSA project [grant number 956831]; the Spanish Government [grant numbers PCI2021-121964, PDC2022-133323-I00, PID2019 - 107255GBC21, MCIN/AEI/10.13039/501100011 033]; the ...

Treffer: Automated parallel execution of distributed task graphs with FPGA clusters

Weitere Informationen

Links

Zusatz-Funktionen