Treffer: Performance-Driven Algorithm Engineering ; Optimising Pairwise Sequence Alignment and Pattern Matching Algorithms in the Era of Pangenomic Sequence Analysis ; Leistungsorientierte Algorithmenentwicklung

Title:

Performance-Driven Algorithm Engineering ; Optimising Pairwise Sequence Alignment and Pattern Matching Algorithms in the Era of Pangenomic Sequence Analysis ; Leistungsorientierte Algorithmenentwicklung

Authors:

Rahn, René

Contributors:

male, Reinert, Knut, Renard, Bernhard

Publication Year:

2023

Collection:

FU Berlin: Refubium

Subject Terms:

sequence alignment, SeqAn, pattern matching, computational pangenomics, hardware acceleration, ddc:004

Document Type:

Dissertation doctoral or postdoctoral thesis

File Description:

V, 185 Seiten; application/pdf

Language:

English

DOI:

10.17169/refubium-41699

Availability:

https://refubium.fu-berlin.de/handle/fub188/41976
https://doi.org/10.17169/refubium-41699
https://nbn-resolving.org/urn:nbn:de:kobv:188-refubium-41976-0

Rights:

https://creativecommons.org/licenses/by-sa/4.0/

Accession Number:

edsbas.9DEE603C

Database:

BASE

Weitere Informationen

A number of technological advancements in high-throughput genome sequencing have led to the generation of exabyte-scale sequencing data worldwide in recent years. These developments have facilitated large-scale resequencing projects like the 1000 Genomes Project, which aim to catalog the genetic diversity of organisms and speciﬁc populations. In the context of medical research, incorporating population data, is of particular interest. However, existing applications are often optimised for analysing a few sequences, and the methods used cannot be easily transferred to these vast datasets without exceeding system resources or producing results within a suitable time frame. Simultaneously, the execution model of processors has evolved from sequential to highly parallel process execution, thanks to the addition of processor cores, vector processing units, and advances in superscalar processor designs. This ongoing development in high-performance computing requires applications and algorithms to scale with the growing levels of parallelism. However, highly optimised algorithms are often embedded within applications, making them practically inaccessible to the scientiﬁc community. In this dissertation, we ﬁrst investigate a generic approach to parallelise and vectorise pairwise sequence alignments using a dynamically scalable concurrency model. We explore various techniques and code-level optimisations to effectively utilise the available parallelisms on modern high-performance CPUs. The results demonstrate that the dynamically accelerated pairwise sequence alignment scales proportionally with the number of cores and provides speed-ups of up to a factor of 2500 compared to the sequential reference implementation on modern hardware. Second, we propose a general solution for data-compressed acceleration of pattern matching algorithms by compressing a large collection of related DNA sequences and providing a set of composable algorithms to reﬁne and optimise the applicable operations. Our research on data-compressed ...

Treffer: Performance-Driven Algorithm Engineering ; Optimising Pairwise Sequence Alignment and Pattern Matching Algorithms in the Era of Pangenomic Sequence Analysis ; Leistungsorientierte Algorithmenentwicklung

Weitere Informationen

Links

Zusatz-Funktionen