Treffer: Feature-Based Time-Series Analysis in R using the Theft Ecosystem.

Title:
Feature-Based Time-Series Analysis in R using the Theft Ecosystem.
Authors:
Henderson, Trent1 then6675@uni.sydney.edu.au, Fulcher, Ben D.1 ben.fulcher@sydney.edu.au
Source:
R Journal. Sep2025, Vol. 17 Issue 3, p43-68. 26p.
Database:
Academic Search Index

Weitere Informationen

Time series are measured and analyzed across the sciences. One method of quantifying the structure of time series is by calculating a set of summary statistics or ‘features’, and then representing a time series in terms of its properties as a feature vector. The resulting feature space is interpretable and informative, and enables conventional statistical learning approaches, including clustering, regression, and classification, to be applied to time-series datasets. Many open-source software packages for computing sets of time-series features exist across multiple programming languages, including ‘catch22’ (22 features: Matlab, R, Python, Julia), ‘feasts’ (43 features: R), ‘tsfeatures’ (62 features: R), ‘Kats’ (40 features: Python), ‘tsfresh’ (783 features: Python), and ‘TSFEL’ (156 features: Python). However, there are several issues: (i) a singular access point to these packages is not currently available; (ii) to access all feature sets, users must be fluent in multiple languages; and (iii) these featureextraction packages lack extensive accompanying methodological pipelines for performing feature-based time-series analysis, such as applications to time-series classification. Here we introduce a solution to these issues in the form of two complementary statistical software packages for R called ‘theft’: Tools for Handling Extraction of Features from Time series and ‘theftdlc’: theft ‘downloadable content’. ‘theft’ is a unified and extendable framework for computing features from the six open-source time-series feature sets listed above as well as custom user-specified features. ‘theftdlc’ is an extension package to ‘theft’ which includes a suite of functions for processing and interpreting the performance of extracted features, including extensive data-visualization templates, low-dimensional projections, and time-series classification. With an increasing volume and complexity of large time-series datasets in the sciences and industry, ‘theft’ and ‘theftdlc’ provide a standardized framework for comprehensively quantifying and interpreting informative structure in time series. [ABSTRACT FROM AUTHOR]