Treffer: TaxaPLN: a taxonomy-aware augmentation strategy for microbiome-trait classification including metadata.

Title:
TaxaPLN: a taxonomy-aware augmentation strategy for microbiome-trait classification including metadata.
Authors:
Chaussard A; Laboratoire de Probabilités, Statistique et Modélisation, LPSM, Sorbonne Université, Université Paris Cité, CNRS, F-75005, Paris, France. alexandre.chaussard@sorbonne-universite.fr., Bonnet A; Laboratoire de Probabilités, Statistique et Modélisation, LPSM, Sorbonne Université, Université Paris Cité, CNRS, F-75005, Paris, France., Le Corff S; Laboratoire de Probabilités, Statistique et Modélisation, LPSM, Sorbonne Université, Université Paris Cité, CNRS, F-75005, Paris, France., Sokol H; Centre de Recherche Saint-Antoine, CRSA, AP-HP, Sorbonne Université, INSERM UMRS-938, F-75012, Paris, France.; Gut, Liver & Microbiome Research, FHU, Paris, France.
Source:
BMC bioinformatics [BMC Bioinformatics] 2025 Nov 28; Vol. 27 (1), pp. 1. Date of Electronic Publication: 2025 Nov 28.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: BioMed Central Country of Publication: England NLM ID: 100965194 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2105 (Electronic) Linking ISSN: 14712105 NLM ISO Abbreviation: BMC Bioinformatics Subsets: MEDLINE
Imprint Name(s):
Original Publication: [London] : BioMed Central, 2000-
References:
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):2879-2896. (PMID: 35749321)
Nat Med. 2019 Apr;25(4):679-689. (PMID: 30936547)
Bioinformatics. 2025 Feb 04;41(2):. (PMID: 39799515)
Genome Biol. 2020 May 25;21(1):122. (PMID: 32450885)
Sci Rep. 2024 Oct 23;14(1):25099. (PMID: 39443578)
Gut. 2017 Jan;66(1):70-78. (PMID: 26408641)
Nat Biotechnol. 2014 Aug;32(8):822-8. (PMID: 24997787)
Cell Host Microbe. 2015 Feb 11;17(2):260-73. (PMID: 25662751)
Nat Rev Microbiol. 2018 Jul;16(7):410-422. (PMID: 29795328)
Front Microbiol. 2021 Feb 19;12:634511. (PMID: 33737920)
Nat Methods. 2017 Oct 31;14(11):1023-1024. (PMID: 29088129)
Gigascience. 2021 Feb 5;10(2):. (PMID: 33543271)
Front Microbiol. 2017 Nov 15;8:2224. (PMID: 29187837)
Nature. 2019 May;569(7758):655-662. (PMID: 31142855)
Nat Med. 2025 Jul;31(7):2222-2231. (PMID: 40200054)
Front Microbiol. 2025 Feb 05;15:1488656. (PMID: 39974372)
Nat Med. 2019 Jun;25(6):968-976. (PMID: 31171880)
Bioinformatics. 2024 Mar 29;40(4):. (PMID: 38569898)
Bioinformatics. 2019 Jul 15;35(14):i31-i40. (PMID: 31510701)
Nat Med. 2018 Apr 10;24(4):392-400. (PMID: 29634682)
Mol Syst Biol. 2014 Nov 28;10:766. (PMID: 25432777)
Nat Commun. 2020 Mar 31;11(1):1612. (PMID: 32235826)
Contributed Indexing:
Keywords: Data augmentation; Generative model; Microbiology; Variational inference
Entry Date(s):
Date Created: 20251129 Date Completed: 20260104 Latest Revision: 20260106
Update Code:
20260106
PubMed Central ID:
PMC12763835
DOI:
10.1186/s12859-025-06312-z
PMID:
41315930
Database:
MEDLINE

Weitere Informationen

Background: The gut microbiome plays a crucial role in human health, making it a cornerstone of modern biomedical research. To study its structure and dynamics, machine learning models are increasingly used to identify key microbial patterns associated with disease and environmental factors, but their performance is often limited by the intrinsic complexity of microbiome data and the small size of available cohorts. In this context, data augmentation has emerged as a promising strategy to overcome these challenges by generating artificial microbiome profiles.
Results: We introduce TaxaPLN, a data augmentation method based on PLN-Tree generative models, which leverages the taxonomy and a data-driven sampler to generate realistic synthetic microbiome compositions. Additionally, we propose a conditional extension based on feature-wise linear modulation, enabling covariate-aware generation. Experiments on diverse curated microbiome datasets show that TaxaPLN preserves ecological properties and generally improves or maintains predictive performances, outperforming state-of-the-art baselines on most tasks. Furthermore, the conditional variant of TaxaPLN establishes a new benchmark for metadata-aware microbiome augmentation.
Conclusion: TaxaPLN provides a model-based framework for augmenting microbiome datasets while preserving their ecological and clinical relevance. By integrating taxonomic structure and host metadata, it enhances predictive modeling across diverse real-world settings. To facilitate reproducible and scalable microbiome analysis using our method, TaxaPLN is released as an open-source Python package available on PyPI (plntree), with MIT-licensed source code hosted at https://github.com/AlexandreChaussard/PLNTree-package .
(© 2025. The Author(s).)

Declarations. Ethics approval and consent to participate: Microbiome data used in this study originate from the publicly available curatedMetagenomicData database, which aggregates datasets approved by the respective institutional review boards. No additional ethics approval was required for our work. Consent for publication: Not applicable. Conflict of interest: The authors declare no conflict of interest.