Treffer: Tomtom-lite: accelerating Tomtom enables large-scale and real-time motif similarity scoring.
Weitere Informationen
Summary Pairwise sequence similarity is a core operation in genomic analysis, yet most attention has been given to sequences made up of discrete characters. With the growing prevalence of machine learning, calculating similarities for sequences of continuous representations, e.g. frequency-based position-weight matrices (PWMs) and attribution-based contribution-weight matrices, is taking on newfound importance. Tomtom has previously been proposed as an algorithm for identifying pairs of PWMs whose similarity is statistically significant, but the implementation remains inefficient for both real-time and large-scale analysis. Accordingly, we have re-implemented Tomtom as a numba-accelerated Python function that is natively multi-threaded, avoids cache misses, more efficiently caches intermediate values, and uses approximations at compute bottlenecks. Here, we provide a detailed description of the original Tomtom method and present results demonstrating that our re-implementation can achieve over a 1000-fold speedup compared with the original tool on reasonable tasks. Availability and implementation Our implementation of Tomtom is freely available as a Python package at https://github.com/jmschrei/memesuite-lite , which can be downloaded via pip install memelite or at https://zenodo.org/records/17008952. [ABSTRACT FROM AUTHOR]
Copyright of Bioinformatics is the property of Oxford University Press / USA and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)