Treffer: FM3VCF: A software library for accelerating the loading of large VCF files in genotype data analyses.

Title:
FM3VCF: A software library for accelerating the loading of large VCF files in genotype data analyses.
Authors:
Zuo, Zhen1 (AUTHOR), Li, Mingliang1 (AUTHOR), Li, Qi1 (AUTHOR), Li, Zhuo1 (AUTHOR), Liu, Defu2 (AUTHOR), Ye, Guanshi1 (AUTHOR), Tang, You1,2 (AUTHOR) tangyou9000@163.com
Source:
PLoS ONE. 6/4/2025, Vol. 20 Issue 6, p1-12. 12p.
Database:
Academic Search Index

Weitere Informationen

The increasing size of genotype data has led to the loading of VCF files becoming a computational bottleneck in various analyses, including imputation and genome-wide association studies (GWAS). To address this issue, we developed a software library, FM3VCF (fast M3VCF), that utilizes multiple CPU threads to accelerate this process and compress VCF files into the more compact M3VCF format. FM3VCF can convert VCF files into the exclusive data format of MINIMAC4 and M3VCF and can efficiently read and parse data from VCF files. Compared with m3VCFtools, FM3VCF exhibits a speed improvement of approximately 36-fold in the compression of VCF files to the M3VCF format. This acceleration addresses a limitation faced by MINIMAC4 when dealing with datasets containing millions of samples. Furthermore, FM3VCF is approximately 3 times faster than HTSlib, including decompressing and parsing, for reading compressed VCF files. FM3VCF is an effective tool for both compressing VCF files efficiently and accelerating the loading of large VCF files in genotype data analyses. By fully utilizing multiple CPU threads, FM3VCF can significantly reduce the computational burden of various genomic analyses. [ABSTRACT FROM AUTHOR]