Engineering and Technology

Enhancing Indoor Temperature Forecasting through Synthetic Data in Low-Data Environments

Z. Thiry, M. Ruocco, et al.

This exciting research by Zachari Thiry, Massimiliano Ruocco, Alessandro Nocente, and Michail Spitieris dives into the innovative use of synthetic data to enhance indoor temperature forecasting for HVAC systems. By harnessing advanced AI techniques like GANs and VAEs, the study shows remarkable improvements in forecasting accuracy, even in data-scarce environments.

00:00

Playback language: English

Index

Introduction

Accurate indoor temperature forecasting is essential for optimizing HVAC systems and reducing energy consumption in buildings, a significant contributor to global emissions. The European Union highlights the importance of efficient building energy management to meet sustainability goals. Traditional scheduled-based HVAC control systems are inefficient, neglecting factors like outdoor weather and occupancy. Machine learning, particularly RNNs and LSTMs, have shown promise in surpassing traditional physics-based methods for indoor temperature forecasting. However, most existing models rely on substantial datasets, which are often unavailable or costly to obtain in real-world settings. This research focuses on addressing the challenge of limited data availability by exploring synthetic data augmentation techniques using state-of-the-art AI-based methods, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), to enhance the performance of forecasting models. This approach aims to reduce the reliance on extensive and expensive data acquisition, especially in scenarios with repetitive heating and cooling cycles.

Literature Review

Existing literature on time series data augmentation broadly categorizes methods into traditional techniques (e.g., scaling, rotation), GAN-based approaches, and Autoencoder-based approaches. Traditional methods are simple but may disrupt temporal relationships. Generative models, particularly GANs, are preferred for preserving temporal dynamics, although they can face convergence issues in low-data scenarios. The literature extensively documents the use of GANs for synthetic data generation across various domains, including renewable energy forecasting and technology prediction. However, the application of synthetic data augmentation to indoor temperature forecasting in low-data environments remains relatively under-explored. This study specifically investigates the use of GANs (TimeGAN, DoppleGANger) and VAEs (TimeVQVAE) for synthetic time series generation in this context.

Methodology

The research utilizes data from a dedicated test facility (Test-cell) acquired over four phases (RICO1-RICO4). The dataset comprises various features, including setpoints (HVAC system controls), features of interest (indoor temperatures), environmental variables, and control features. Data preprocessing involves standard scaling and restructuring into a suitable format for the models. A subset of the data is excluded due to inconsistencies or missing values. The data is then divided into training and testing sets. A labeling methodology is applied to create categories (monotonic positive, monotonic negative, non-monotonic) based on temperature trends within each series. Three synthesizers, TimeGAN, DoppleGANger and TimeVQVAE, are evaluated. Hyperparameter tuning was performed for DoppleGANger and TimeVQVAE. TimeGAN failed to converge and was excluded from downstream tasks. The synthesizers' performance is evaluated using traditional metrics (t-SNE, PCA) and a utility metric that assesses their contribution to forecasting accuracy. Two forecasting experiments were conducted: 1) General data augmentation: comparing the performance of a simple LSTM forecaster trained with real data only, synthetic data only, and a combination of real and synthetic data. 2) Class imbalance: investigating the use of synthetic data to address class imbalances in the training data. The performance of the forecasting models is evaluated using MSE, MAE, MAPE, and MASE.

Key Findings

Visual inspection and PCA analysis reveal that TimeVQVAE generates more diverse and realistic synthetic data compared to DoppleGANger, while TimeGAN failed to converge. DoppleGANger struggles with generalization, generating samples from limited regions of the data distribution. TimeVQVAE effectively covers the entire data space. Forecasting experiments show that augmenting the training data with synthetic samples generated by TimeVQVAE improves forecasting accuracy (measured by MAE, MSE, MASE, and MAPE) for an LSTM forecaster. Training with synthetic data alone yields notable improvement over real data only. Combining real and synthetic data further enhances accuracy, although variance increases. The class imbalance experiment revealed that using synthetic data to rebalance class distribution resulted in neither significant improvement nor degradation in forecasting performance, although the variance increased depending on the metric and the ablation ratio.

Discussion

The findings demonstrate the potential of synthetic data augmentation using TimeVQVAE to improve indoor temperature forecasting accuracy in low-data environments. The superior performance of TimeVQVAE over the other models suggests its suitability for generating diverse and representative synthetic time series data. The increased variance observed when using synthetic data highlights the importance of careful model selection and data synthesis strategies. Further research is needed to fully understand this effect. The results regarding class imbalance highlight the limitations of a simple augmentation approach and suggest the need for more sophisticated methods.

Conclusion

This research demonstrates the effectiveness of using TimeVQVAE to generate synthetic data for improving indoor temperature forecasting in data-scarce environments. Augmenting datasets with synthetic data from TimeVQVAE led to enhanced forecasting accuracy, although increased variance needs further investigation. Addressing class imbalances using synthetic data did not significantly impact performance. Future work should explore more advanced augmentation techniques, test the approach on datasets from diverse domains, and investigate the impact of synthetic data on more complex forecasting models.

Limitations

The study is limited to a specific indoor environment and data acquisition setup. The generalizability of the findings to other building types or climates needs further investigation. The relatively simple LSTM forecaster used in the study may not fully capture the complexities of real-world indoor temperature dynamics. The observation of increased variance when using synthetic data requires further exploration to determine its causes and potential mitigation strategies.

Related Publications

Explore these studies to deepen your understanding of the subject.

Engineering and Technology

Enhancing Indoor Temperature Forecasting through Synthetic Data in Low-Data Environments

Z. Thiry, M. Ruocco, et al.

Computer Science

Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing

D. Rankin, M. Black, et al.

Environmental Studies and Forestry

Addressing gaps in data on drinking water quality through data integration and machine learning: evidence from Ethiopia

A. A. Ambel, R. Bain, et al.

Medicine and Health

Mpox Detection Advanced: Rapid Epidemic Response Through Synthetic Data

Y. Kularathne, P. Janitha, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny