logo
ResearchBunny Logo
Enhancing Indoor Temperature Forecasting through Synthetic Data in Low-Data Environments

Engineering and Technology

Enhancing Indoor Temperature Forecasting through Synthetic Data in Low-Data Environments

Z. Thiry, M. Ruocco, et al.

This exciting research by Zachari Thiry, Massimiliano Ruocco, Alessandro Nocente, and Michail Spitieris dives into the innovative use of synthetic data to enhance indoor temperature forecasting for HVAC systems. By harnessing advanced AI techniques like GANs and VAEs, the study shows remarkable improvements in forecasting accuracy, even in data-scarce environments.... show more
Introduction

Indoor temperature forecasting enables proactive HVAC control by predicting future indoor temperatures using historical data and environmental variables. Buildings account for 40% of energy consumption and 36% of CO2 emissions in the EU, making efficient HVAC operation crucial. Schedule-based control ignores exogenous factors (weather, solar radiation, occupancy), risking over/under-conditioning and discomfort. Forecast-informed control can improve comfort and reduce energy use. Machine learning models, particularly RNNs such as LSTMs, have shown strong performance for indoor temperature forecasting. Rather than seeking the best forecaster, this work aims to enhance forecasting performance in low-data environments via synthetic data augmentation. Synthetic data generation, often via GANs and VAEs, has seen wide adoption across domains, but its impact on temperature forecasting with limited data remains underexplored. The paper investigates whether augmenting with synthetic time series improves downstream forecasting, and how to fuse real and synthetic data. Roadmap: review state-of-the-art synthesizers; present methodology including fusion of real and synthetic samples and class-imbalance mitigation; report experimental results.

Literature Review

Modern time series data augmentation approaches fall into traditional, GAN-based, and autoencoder-based techniques. Traditional methods (e.g., scaling, rotation) are simple and inexpensive but can disrupt temporal relationships. Generative models (GANs, VAEs) better preserve temporal dynamics. Surveys report many methods (e.g., 9 VAE-based and 14 GAN-based in one review), with GANs popular but prone to convergence issues in low-data settings, while offering diversity. Applications span energy forecasting using physical models and weather predictions, synthetic patent data for technology forecasting with GANs, and K-means-based generation with mixed results for deep models. Given this landscape, the study focuses on deep learning synthesizers suitable for low data: GANs for diversity and VAEs for training stability. Three notable models are selected: TimeGAN, DoppelGANger, and TimeVQVAE, with mechanisms detailed in background theory.

Methodology

Background theory: TimeGAN augments adversarial training with supervised loss, uses embedding/recovery networks for latent dimension reduction, and trains generator/discriminator in latent space jointly to capture temporal relationships. DoppelGANger targets fidelity and long-term correlations, employs an auxiliary discriminator for metadata (unused here), constrains generation via randomized min-max scaling to mitigate mode collapse, integrates LSTM cells, and uses batched generation for efficiency and memory. TimeVQVAE applies VQ-VAE to discretize latent space and avoid posterior collapse, learns a modified MaskGIT prior for faster and higher-quality sampling, and operates in a time-frequency space (DFT), training separate models for low/high-frequency components. Data acquisition and processing: Data collected in a dedicated test facility (Test-cell) as a tabular time series with N=59,040 rows and D=81 features at 1 min−1. Each series is of shape (240, D), representing four hours (L=240). Acquisition phases (RICO): RICO1 (Jul–Aug 2023, 17 days, 102 series; some inconsistencies), RICO2 (Oct 2023, 10 days, 60 series; 3h constrained +1h free fall), RICO3 (Jan 2024, 4 days, 24 series; 16h constrained +4h free fall; only first 4h used yielding 6 useful series), RICO4 (Feb 2024, 10 days, 60 series; highest quality). Feature categories: identifiers (Phase, Step, Flag), setpoints (EC3, SB43, B46, SB47), features of interest (e.g., internal air temperature), environmental variables (weather), and control features (for integrity checks). Acquisition protocol sets random actuator setpoints every four hours from permitted values (e.g., heaters off/20/40/60°C), each combination defining one series. Preprocessing: Manual exclusion of anomalous RICO1 series (19), RICO2 series due to format mismatch, most of RICO3 (use 6), and RICO4 series with missing values; flagged with inclusion tag (1 include, 0 exclude). Standard scaling applied. Data reshaped to (N, L, C) with C=1 focusing on B.RTD3 (center room temperature). Split: 20% of series from each phase reserved as test; remaining as train, yielding train_real=116 series and test_real=31 series. Labeling: For each series, consider first 3 hours (exclude final hour), apply 5-point moving average with edge padding, compute local derivatives, and assign class labels: Monotonic Positive (0), Monotonic Negative (1), Non-Monotonic (2). Synthesizer training and evaluation: Train on the 116-series training set. Implementations from TimeVQVAE [11], Gretel.ai for DoppelGANger, and official TimeGAN [20]. Training specifics: TimeGAN — various settings tried, no satisfactory convergence; DoppelGANger — sequence length 240, batch size 8, 1000 epochs; TimeVQVAE — VQVAE training 2000 epochs and prior 10000 epochs (manual hyperparameter tuning). Synthesizer evaluation uses PCA, t-SNE, visual inspection, and a forecasting utility metric (downstream LSTM performance). Forecasting utility: Forecaster is a simple one-layer LSTM followed by a fully connected layer; hyperparameters tuned manually on the training set. Inputs are sub-sampled by factor 10; horizon is 30 minutes ahead. Experiment 1 (general augmentation): Train synthesizer on train_real; sample 256 synthetic series (synth). Compare strategies: TRTR (Train Real, Test Real), TSTR (Train Synthetic, Test Real), TRSTR (Train Real+Synthetic, Test Real). Train 100 forecasters per strategy with unique synthetic sets where applicable. Experiment 2 (class imbalance): Construct imbalanced training sets by ablating a class i with ratio r in {0.25, 0.5, 0.75, 1.0}; train 12 synthesizers (3 classes × 4 ratios) on the ablated sets. Train baseline LSTMs on ablated sets S_e^{t,r}, and test LSTMs on augmented sets where missing-class samples are appended via conditional generation from the corresponding synthesizer Σ_r. Metrics: MSE, MAE, MAPE, and MASE (with n equal to the prediction window length due to no seasonality).

Key Findings

Synthesizer performance: TimeGAN failed to converge and was excluded from downstream tasks. DoppelGANger generated plausible series but exhibited high-frequency noise artifacts and limited coverage of the data distribution (observed in visual samples, PCA, and t-SNE). TimeVQVAE produced diverse, plausible samples without noisy artifacts and covered more of the data space in PCA; in t-SNE, it extended beyond original data boundaries but avoided scattered correlation groups seen with DoppelGANger. Forecasting Experiment 1 (general augmentation): Across 100 runs per strategy, adding synthetic data improved average forecasting accuracy versus the baseline TRTR. Aggregated means (lower is better): TRTR — test_mse 0.003119, test_mase 2.390434, test_mae 0.037563, test_mape 0.251342; TSTR — test_mse 0.001791, test_mase 2.027180, test_mae 0.030242, test_mape 0.199215; TRSTR — test_mse 0.001714, test_mase 1.854287, test_mae 0.028266, test_mape 0.162576. Aggregated standard deviations: TRTR — test_mse 0.000579, test_mase 0.290714, test_mae 0.003708, test_mape 0.012571; TSTR — 0.000570, 0.346175, 0.005047, 0.044246; TRSTR — 0.000756, 0.392574, 0.005974, 0.032716. Synthetic augmentation reduced mean errors but increased variance relative to the baseline, consistent across metrics. Forecasting Experiment 2 (class imbalance): When ablating class 0 with ratios r∈{0.25,0.5,0.75,1.0}, augmenting with class-conditional synthetic samples yielded no significant mean performance change versus the baseline; likelihood distributions for baseline vs augmented overlapped. Aggregated means (examples): at r=0.75, baseline vs augmented — test_mae 0.03457 vs 0.03447; test_mape 0.23888 vs 0.23835; test_mase 2.20503 vs 2.19738; test_mse 0.00277 vs 0.00277. Variance effects varied by metric and ratio, ranging from minor decreases (~0.14%) to larger increases (up to ~55%). Overall, synthetic augmentation improves accuracy in data-scarce training (Experiment 1), while class-balancing via synthetic data shows neutral mean impact with mixed variance changes (Experiment 2).

Discussion

The study addresses the central question of whether synthetic data can enhance indoor temperature forecasting under limited real data. Results show that training solely on synthetic data (TSTR) already improves average forecasting metrics over TRTR, and combining real with synthetic data (TRSTR) yields the best mean performance across MSE, MAE, MAPE, and MASE. This suggests that well-trained synthesizers (notably TimeVQVAE) can capture key dynamics of indoor temperature series and provide beneficial diversity, reducing generalization error of a simple LSTM forecaster. However, the improvements come with increased variance, likely due to variability across generated datasets, indicating sensitivity to the quality and distributional alignment of synthetic samples. Visual and manifold analyses (PCA, t-SNE) explain part of the behavior: TimeVQVAE covers a broader region of the data manifold than DoppelGANger, correlating with better utility in forecasting, while TimeGAN’s failure to converge underscores GAN instability in low-data regimes. For class imbalance, conditional synthetic oversampling did not materially alter mean performance, potentially due to the test set’s imbalance and the relative simplicity of the task/model; nevertheless, variance changes suggest synthetic balancing can affect training stability. Overall, the findings advocate for synthetic augmentation to mitigate data scarcity in HVAC forecasting, while highlighting the need to manage variance and ensure synthesizer fidelity to the target distribution.

Conclusion

A VQ-VAE-based synthesizer (TimeVQVAE) outperforms the evaluated GAN-based methods for generating uni-variate time series in low-data settings. Augmenting limited real data with synthetic series improves forecasting accuracy for a simple LSTM model, with the best results when combining real and synthetic training data. Using synthetic data for class balancing showed negligible mean performance change, though it can influence variance. Future work includes: analyzing variance sources and mitigation strategies (e.g., filtering or selection of synthetic samples, ensemble or regularization techniques), evaluating on additional time series domains and multi-variate settings, exploring stronger forecasters and conditioning schemes, and improving test set balance and design to isolate class effects.

Limitations

Limitations include: low-data regime confined to a single test facility and primarily uni-variate forecasting (single channel B.RTD3), which may limit generalizability; TimeGAN instability and failure to converge restricts comparative conclusions across GAN variants; increased training variance when using synthetic data; manual exclusions and potential inconsistencies across acquisition phases; imbalanced and relatively small test set may confound class-imbalance conclusions; use of a simple LSTM forecaster and manual hyperparameter tuning; domain shift in t-SNE/PCA interpretations; lack of year-round data due to cost constraints.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny