Treffer: AudioSet-tools: a Python framework for taxonomy-aware AudioSet curation and reproducible audio research.

Title:
AudioSet-tools: a Python framework for taxonomy-aware AudioSet curation and reproducible audio research.
Source:
EURASIP Journal on Audio Speech & Music Processing; 12/2/2025, Vol. 2026 Issue 1, p1-29, 29p
Database:
Complementary Index

Weitere Informationen

This work presents AudioSet-Tools, a modular and extensible Python framework designed to streamline the creation of task-specific datasets derived from Google AudioSet. Despite its extensive coverage, AudioSet suffers from weak labeling, class imbalance, and a loosely structured taxonomy, which hinder its applicability in machine listening workflows. AudioSet-Tools addresses these issues through configurable taxonomy-consistent label filtering and class rebalancing strategies. The framework includes automated routines for data download and transformation, enabling reproducible and semantically consistent dataset generation for pre-training and downstream fine-tuning of deep learning models. While domain-agnostic, we showcase its versatility through AudioSet-EV, a curated subset focused on emergency vehicle siren recognition — a socially relevant and technically challenging use case that highlights structural and semantic gaps in the AudioSet taxonomy. We further provide an extensive comparative benchmark of AudioSet-EV against state-of-the-art emergency vehicle corpora. All source code and datasets are openly released on GitHub and Zenodo, fostering transparency and reproducibility in real-world audio signal processing research. [ABSTRACT FROM AUTHOR]

Copyright of EURASIP Journal on Audio Speech & Music Processing is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)