Introduction
The analysis of complex, real-world data, particularly from dynamic systems exhibiting nonlinear and chaotic behavior (like industrial and weather phenomena), presents significant challenges. While multivariate entropy techniques offer a powerful approach to analyze multiple time series, existing methods often struggle with computational complexity, especially when dealing with large datasets or complex network structures. This paper addresses this challenge by introducing a new method, mvDEG, that combines the strengths of multivariate dispersion entropy with graph-based representations. The increasing availability of graph-structured data, driven by advancements in data collection and analysis across diverse fields, necessitates the development of tools capable of analyzing both the temporal dynamics and topological relationships inherent in these datasets. Current graph-based entropy methods frequently neglect temporal information, limiting their utility in understanding the interaction between spatial structure and temporal evolution. The mvDEG method is designed to overcome these limitations, providing a more comprehensive approach to analyzing complex multivariate time series data on graphs.
Literature Review
Multivariate entropy techniques, such as Multivariate Dispersion Entropy (mvDE) and Multivariate Sample Entropy, are used to quantify complexity and dynamics in multivariate data. mvDE offers advantages over other entropy methods like Multivariate Sample Entropy, exhibiting better performance, stability, and effectiveness with shorter time series. The growing importance of graph-based methods stems from the ability to represent complex relationships in various domains. Recent advancements include extending Permutation and Dispersion Entropy to graph data, enabling the incorporation of topological dimensions. This previous work on graph-based Permutation and Dispersion Entropy forms the foundation for this research, addressing the need for methods that integrate both temporal and spatial aspects of complex data. However, existing methods often focus on either topological or temporal information, but not both simultaneously. This limitation highlights the need for a method that combines these aspects for a more comprehensive analysis, especially in applications where understanding their interplay is crucial, such as climate science or industrial processes.
Methodology
The mvDEG method enhances dispersion entropy by integrating both temporal and topological data dimensions. It builds upon previous work on topological approaches to dispersion entropy and temporal-focused multivariate dispersion entropy. The core of the mvDEG algorithm comprises two steps:
1. **Coarse-Graining Process:** The multivariate signal is divided into non-overlapping segments of length τ (scale factor), and the average of each segment is calculated for each channel, generating coarse-grained signals. While the paper uses a straightforward coarse-graining approach, other methods could also be employed.
2. **Graph-Based Multivariate Dispersion Entropy Calculation:** An adjacency matrix, I<sub>p</sub>, representing the connectivity between channels (predefined, fully connected, or data-inferred), is used to analyze inter-channel interactions. The algorithm constructs an embedding matrix using the adjacency matrix of the Cartesian product graph ℙ<sub>𝑁</sub><sup>𝐼𝑝</sup>, a diagonal normalization matrix D, and a vectorized form of the multivariate signal X. The embedding matrix is then mapped to classes, dispersion patterns are identified, and their relative frequencies are calculated to compute the mvDE<sub>c</sub> using Shannon entropy. The major computational challenge lies in calculating large matrix powers. This is addressed using an efficient implementation strategy.
The efficient implementation of mvDEG utilizes matrix properties and Kronecker products to overcome the computational bottleneck associated with large matrix powers. The adjacency matrix of ℙ<sub>𝑁</sub><sup>𝐼𝑝</sup> is expressed as a sum of Kronecker products involving smaller matrices, significantly reducing computational cost. This optimized algorithm reduces the computational complexity of calculating the *m*-th power of a large matrix to a sum of Kronecker products of smaller matrices, leading to a significant improvement in computational efficiency compared to traditional methods.
Key Findings
The mvDEG method was validated using synthetic signals (multivariate 1/f noise and White Gaussian Noise) and real-world datasets (weather and two-phase flow data). In the analysis of uncorrelated noise, both mvDE and mvDEG produced similar results, differentiating between 1/f noise and WGN across different scale factors. However, mvDEG demonstrated its computational superiority. In the analysis of correlated noise, mvDEG exhibited distinct entropy values for different correlation structures, while mvDE struggled to differentiate between them. This superior performance is particularly evident with shorter time series (N=500), highlighting mvDEG's robustness in data-limited scenarios. The computational time comparison between mvDE and mvDEG showed that mvDEG's computational time grows linearly with the number of vertices/nodes, while mvDE's time grows exponentially, resulting in significant computational advantages, especially for large datasets.
The application of mvDEG to real-world data yielded insightful results:
* **Weather Data:** The analysis of meteorological data (temperature, wind speed, rainfall) from Brittany ground stations showed distinct entropy profiles for each variable, effectively capturing their varying complexities and demonstrating mvDEG's ability to handle spatial and temporal information simultaneously. The computational demands of the classical mvDE were significantly higher, rendering it impractical for this dataset. mvDEG's computational advantage allowed for analysis without memory errors, while traditional methods like univariate multiscale Dispersion Entropy and Sample Entropy failed to distinguish the differences between temperature and wind data.
* **Two-Phase Flow Data:** Analyzing data from two-phase flow experiments using Electrical Resistance Tomography (ERT), mvDEG successfully distinguished between six different flow regimes (bubbly, stratified, slug, plug, churn, and annular). The method's sensitivity to the dynamics of each flow regime is reflected in the distinct entropy profiles obtained, particularly at lower scales. mvDEG's ability to process the entire dataset without dimensionality reduction highlighted its robustness and efficiency compared to methods that require data reduction steps, which might discard important information. The state-of-the-art methods also resulted in memory errors.
Discussion
The results demonstrate that mvDEG effectively addresses the limitations of existing multivariate entropy methods, offering a computationally efficient and robust approach for analyzing complex multivariate time series data within graph-based frameworks. The ability to combine temporal and topological information significantly enhances the analysis, providing a more comprehensive understanding of the underlying dynamics. The method's superior performance in differentiating correlated noise and its successful application to real-world datasets highlight its potential across various fields. The linear computational time complexity of mvDEG is a crucial advantage, making it applicable to large-scale and real-time analysis where traditional methods might fail. The robustness of mvDEG, particularly with shorter time series, is essential in scenarios where data availability is limited.
Conclusion
The Multivariate Multiscale Graph-based Dispersion Entropy (mvDEG) method presented in this paper provides a significant advancement in the field of multivariate time series analysis. Its unique combination of temporal and topological analysis capabilities, coupled with its superior computational efficiency, makes it a powerful tool for various applications. Future research could explore extensions of mvDEG to other graph-based entropy metrics, and its application to even larger and more complex datasets.
Limitations
While the mvDEG method offers significant advantages, potential limitations include the choice of the adjacency matrix I<sub>p</sub>. The performance of mvDEG could depend on the method used to construct this matrix. Furthermore, the selection of parameters like the embedding dimension (m) and the number of classes (c) can influence the results, requiring careful consideration in each specific application. The coarse-graining method used could also affect the results. Future studies could explore the effects of different coarse-graining approaches on the accuracy and performance of mvDEG.
Related Publications
Explore these studies to deepen your understanding of the subject.