logo
ResearchBunny Logo
OPEN A self-organizing, living library of time-series data

Interdisciplinary Studies

OPEN A self-organizing, living library of time-series data

B. D. Fulcher, C. H. Lubba, et al.

Introducing CompEngine, a revolutionary web platform designed to enhance interdisciplinary collaboration among time-series researchers. Developed by Ben D. Fulcher and his team, it allows users to upload data, explore similar datasets, and receive alerts for future matches, all while fostering connections based on data structure. Dive into a world where data sharing bridges the gap between experimental and theoretical science.

00:00
00:00
~3 min • Beginner • English
Introduction
Time-series data pervade many scientific and commercial domains, yet methods and datasets are rarely compared across disciplinary boundaries. The authors argue that connecting researchers who study similar time-varying dynamics—whether empirical or model-generated—could accelerate understanding of underlying mechanisms and foster collaboration. A key barrier is identifying commonalities between datasets collected at different sampling rates, durations, and in different contexts. The paper introduces CompEngine, a self-organizing library that places time series from diverse origins into a shared feature space so that users can upload data and discover similar datasets immediately and in the future. The platform aims to provide empirical, data-driven connections, bridging theoretical models and real-world measurements, and to serve as a resource for evaluating time-series analysis algorithms on diverse empirical data.
Literature Review
The work builds on feature-based representations of time series, which map a sequence to a vector of numerical descriptors capturing properties such as value distributions, autocorrelation, stationarity, predictability, information-theoretic complexity, and model-fit characteristics. Prior efforts include highly comparative time-series analysis (hctsa), a framework implementing over 7000 features that has been applied to classification, clustering, forecasting, and anomaly detection. Dimensionality reduction and visualization techniques such as t-SNE have been used to reveal structure in high-dimensional feature spaces. Recent work produced catch22, a reduced, interpretable, and efficient 22-feature set distilled from hctsa that maintains strong performance across tasks while minimizing redundancy. These strands provide the methodological foundation for CompEngine’s feature-based similarity, organization, and interactive exploration.
Methodology
Core representation and similarity: CompEngine treats each data object as a univariate, uniformly sampled time series (or, more generally, an ordered real-valued vector). A time series x of length T is mapped to a feature vector f in an F-dimensional space via algorithms that compute statistical and dynamical properties. Feature values are normalized across the dataset using an outlier-robust sigmoidal normalization to weight features equally. Similarity between two time series is computed as the Euclidean distance between their normalized feature vectors. For visualization, high-dimensional feature vectors can be projected into two dimensions using t-SNE to reveal structure. Feature sets: The comprehensive hctsa library (>7000 features) demonstrates the feasibility of organizing diverse time series in feature space. For scalable, online computation, CompEngine uses catch22, an efficient set of 22 features implemented in C that captures a broad range of properties (autocorrelation, predictability, stationarity, value distribution, and self-affine scaling). Distances in catch22 space correlate strongly with those from the full hctsa space, enabling fast, approximate organization suitable for web deployment. Demonstration dataset and organization: The authors computed feature vectors for a mixed dataset spanning simulated systems (deterministic dynamical systems, iterative maps, stochastic differential equations, random noise) and empirical systems (seismology, river flow, share prices, financial log returns, ionosphere, sound effects, animal sounds, music, ECG, RR intervals, gait). A two-dimensional t-SNE projection from the full hctsa space shows distinct regions corresponding to categories with similar dynamical properties, and overlaps indicating meaningful connections between empirical and model-generated dynamics. Platform functionality: Users upload data in .txt, .csv, .xls, .xlsx (single column real numbers), or audio (.mp3, .wav). Audio is converted from the first channel with floating-point encoding and minimum 4 kHz sampling. Time series longer than 10,000 samples are truncated to the first 10,000 for computational efficiency. Bulk upload of multiple univariate series is supported. To contribute data permanently, users supply minimal metadata: Name, Sampling Rate, Description, Source, Category (with hierarchical placement), and free-form Tags; optional contact info enables match notifications. Individual uploads and proposed Categories require administrator approval; bulk uploads require prior approval. Interactive analysis: For an uploaded or selected time series, CompEngine computes features and retrieves nearest neighbors (smallest Euclidean distances in feature space). Results are presented as an interactive network view (nodes are time series colored by category, edges reflect similarity) and a list view sorted by similarity. Users can filter by category, explore neighborhoods via double-click, inspect time traces and metadata, and view feature values with indicators highlighting unusually high/low values relative to the library. Matching data can be exported as JSON or compressed CSVs. The platform also allows browsing by Source, Category, Tag, and custom search. A public API provides programmatic access to individual series and subsets matching criteria. The full database is available for bulk download.
Key Findings
- A common, feature-based representation meaningfully organizes diverse time series (empirical and simulated) into distinct yet interpretable regions in feature space, as visualized via t-SNE. Audio-related data (sound effects, animal sounds, music) cluster together; periodic dynamics cluster in another region; slowly fluctuating series (e.g., share prices) align with relevant stochastic differential equation models. - The platform reveals interdisciplinary connections, such as overlaps between financial share prices and geometric Brownian motion SDEs, and between various audio sources and chaotic oscillatory model outputs. - The reduced catch22 feature set structures datasets similarly to the full hctsa set, with pairwise distance correlations of r = 0.77 on the heterogeneous dataset analyzed, enabling scalable online computation without a software license. - CompEngine launches with an initial library of over 24,000 diverse time series and supports automated similarity-based matching, interactive exploration, and programmatic access, facilitating both scientific discovery and algorithm evaluation across domains.
Discussion
By mapping time series from any domain into a shared feature space, CompEngine addresses the challenge of identifying cross-disciplinary commonalities. The observed clustering and overlaps in feature space demonstrate that empirical dynamical properties, rather than metadata, can organize datasets in ways that surface meaningful relationships. This supports two aims: (i) connecting theoreticians and experimentalists by aligning empirical data with model-generated dynamics, suggesting candidate mechanisms; and (ii) fostering collaboration between researchers studying different systems with similar temporal structure. Furthermore, access to a large, diverse, and evolving repository enables objective benchmarking of time-series analysis algorithms across varied empirical contexts, helping identify where methods excel or fail and guiding method development. The platform’s alerting and contact features further lower barriers to ongoing interdisciplinary engagement as the library grows.
Conclusion
The paper introduces CompEngine, a self-organizing, living library of time-series data that augments conventional metadata-driven repositories with a feature-based computational layer. This enables immediate, data-driven connections among diverse empirical and simulated time series, incentivizing data sharing by providing context at upload time and future match notifications. Demonstrations show meaningful organization of heterogeneous datasets and effective approximation using the efficient catch22 feature set. The resource facilitates interdisciplinary collaboration and systematic evaluation of time-series analysis methods. Looking ahead, the authors envisage continued growth of the library through community contributions and the extension of the self-organizing concept to other complex data types (e.g., networks, images, point clouds, multivariate classification datasets).
Limitations
- Data scope: The platform currently operates on univariate, uniformly sampled time series (or ordered real-valued vectors), which may limit applicability to multivariate or irregularly sampled data. - Truncation: Time series longer than 10,000 samples are truncated to the first 10,000 for efficiency, potentially discarding long-range information. - Approximation: Online organization relies on the 22-feature catch22 set; although distances correlate with those from the full hctsa library (r = 0.77), some information is inevitably lost relative to using thousands of features. - Similarity dependence: Similarity is defined via Euclidean distance on normalized feature vectors; results depend on the chosen feature set and normalization scheme. - Data quality and curation: Contributed datasets and categories require administrator approval; metadata quality depends on user input, which could affect interpretability and searchability. - Visualization: Low-dimensional projections (e.g., t-SNE) are used for visualization and may not preserve all high-dimensional relationships.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny