Treffer: OmpSs-2@Cluster: Distributed memory execution of nested OpenMP-style tasks

Title:
OmpSs-2@Cluster: Distributed memory execution of nested OpenMP-style tasks
Contributors:
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
Publisher Information:
Springer Nature
Publication Year:
2022
Collection:
Universitat Politècnica de Catalunya, BarcelonaTech: UPCommons - Global access to UPC knowledge
Document Type:
Konferenz conference object
File Description:
16 p.; application/pdf
Language:
English
Relation:
https://link.springer.com/chapter/10.1007/978-3-031-12597-3_20; info:eu-repo/grantAgreement/EC/H2020/754337/EU/Co-designed Innovation and System for Resilient Exascale Computing in Europe: From Applications to Silicon/EuroEXA; info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C22/ES/UPC-COMPUTACION DE ALTAS PRESTACIONES VIII/; https://hdl.handle.net/2117/377512
DOI:
10.1007/978-3-031-12597-3_20
Rights:
Open Access
Accession Number:
edsbas.AFF8761
Database:
BASE

Weitere Informationen

State-of-the-art programming approaches generally have a strict division between intra-node shared memory parallelism and inter-node MPI communication. Tasking with dependencies offers a clean, dependable abstraction for a wide range of hardware and situations within a node, but research on task offloading between nodes is still relatively immature. This paper presents a flexible task offloading extension of the OmpSs-2 programming model, which inherits task ordering from a sequential version of the code and uses a common address space to avoid address translation and simplify the use of data structures with pointers. It uses weak dependencies to enable work to be created concurrently. The program is executed in distributed dataflow fashion, and the runtime system overlaps the construction of the distributed dependency graph, enforces dependencies, transfers data, and schedules tasks for execution. Asynchronous task parallelism avoids synchronization that is often required in MPI+OpenMP tasks. Task scheduling is flexible, and data location is tracked through the dependencies. We wish to enable future work in resiliency, scalability, load balancing and malleability, and therefore release all source code and examples open source. ; This research has received funding from the European Union’s Horizon 2020/EuroHPC research and innovation programme under grant agreement No 955606 (DEEP-SEA) and 754337 (EuroEXA). It is supported by the Spanish State Research Agency - Ministry of Science and Innovation (contract PID2019-107255GB and Ramon y Cajal fellowship RYC2018-025628-I) and by the Generalitat de Catalunya (2017-SGR-1414). ; Peer Reviewed ; Postprint (author's final draft)