Treffer: TaxaPLN: a taxonomy-aware augmentation strategy for microbiome-trait classification including metadata.
Nat Med. 2019 Apr;25(4):679-689. (PMID: 30936547)
Bioinformatics. 2025 Feb 04;41(2):. (PMID: 39799515)
Genome Biol. 2020 May 25;21(1):122. (PMID: 32450885)
Sci Rep. 2024 Oct 23;14(1):25099. (PMID: 39443578)
Gut. 2017 Jan;66(1):70-78. (PMID: 26408641)
Nat Biotechnol. 2014 Aug;32(8):822-8. (PMID: 24997787)
Cell Host Microbe. 2015 Feb 11;17(2):260-73. (PMID: 25662751)
Nat Rev Microbiol. 2018 Jul;16(7):410-422. (PMID: 29795328)
Front Microbiol. 2021 Feb 19;12:634511. (PMID: 33737920)
Nat Methods. 2017 Oct 31;14(11):1023-1024. (PMID: 29088129)
Gigascience. 2021 Feb 5;10(2):. (PMID: 33543271)
Front Microbiol. 2017 Nov 15;8:2224. (PMID: 29187837)
Nature. 2019 May;569(7758):655-662. (PMID: 31142855)
Nat Med. 2025 Jul;31(7):2222-2231. (PMID: 40200054)
Front Microbiol. 2025 Feb 05;15:1488656. (PMID: 39974372)
Nat Med. 2019 Jun;25(6):968-976. (PMID: 31171880)
Bioinformatics. 2024 Mar 29;40(4):. (PMID: 38569898)
Bioinformatics. 2019 Jul 15;35(14):i31-i40. (PMID: 31510701)
Nat Med. 2018 Apr 10;24(4):392-400. (PMID: 29634682)
Mol Syst Biol. 2014 Nov 28;10:766. (PMID: 25432777)
Nat Commun. 2020 Mar 31;11(1):1612. (PMID: 32235826)
Weitere Informationen
Background: The gut microbiome plays a crucial role in human health, making it a cornerstone of modern biomedical research. To study its structure and dynamics, machine learning models are increasingly used to identify key microbial patterns associated with disease and environmental factors, but their performance is often limited by the intrinsic complexity of microbiome data and the small size of available cohorts. In this context, data augmentation has emerged as a promising strategy to overcome these challenges by generating artificial microbiome profiles.
Results: We introduce TaxaPLN, a data augmentation method based on PLN-Tree generative models, which leverages the taxonomy and a data-driven sampler to generate realistic synthetic microbiome compositions. Additionally, we propose a conditional extension based on feature-wise linear modulation, enabling covariate-aware generation. Experiments on diverse curated microbiome datasets show that TaxaPLN preserves ecological properties and generally improves or maintains predictive performances, outperforming state-of-the-art baselines on most tasks. Furthermore, the conditional variant of TaxaPLN establishes a new benchmark for metadata-aware microbiome augmentation.
Conclusion: TaxaPLN provides a model-based framework for augmenting microbiome datasets while preserving their ecological and clinical relevance. By integrating taxonomic structure and host metadata, it enhances predictive modeling across diverse real-world settings. To facilitate reproducible and scalable microbiome analysis using our method, TaxaPLN is released as an open-source Python package available on PyPI (plntree), with MIT-licensed source code hosted at https://github.com/AlexandreChaussard/PLNTree-package .
(© 2025. The Author(s).)
Declarations. Ethics approval and consent to participate: Microbiome data used in this study originate from the publicly available curatedMetagenomicData database, which aggregates datasets approved by the respective institutional review boards. No additional ethics approval was required for our work. Consent for publication: Not applicable. Conflict of interest: The authors declare no conflict of interest.