logo
ResearchBunny Logo
Enhancing Indoor Temperature Forecasting through Synthetic Data in Low-Data Environments

Engineering and Technology

Enhancing Indoor Temperature Forecasting through Synthetic Data in Low-Data Environments

Z. Thiry, M. Ruocco, et al.

In a groundbreaking study by Zachari Thiry, Massimiliano Ruocco, Alessandro Nocente, and Michail Spitieris, the potential of synthetic data generated through AI methods like GANs and VAEs to enhance indoor temperature forecasting is explored. This research demonstrates how augmenting real data can significantly boost forecasting accuracy and reduce training variance, addressing a critical challenge in HVAC system control.... show more
Introduction

The study addresses indoor temperature forecasting for proactive HVAC control to improve comfort and reduce energy use. Buildings account for a large share of energy consumption and CO2 emissions in the EU, making efficient HVAC management critical. Traditional schedule-based control often ignores exogenous factors (weather, solar radiation, occupancy), potentially leading to discomfort and inefficiency. Machine learning, particularly RNNs such as LSTMs, has shown strong performance over physics-based approaches for this task. Rather than proposing a new forecasting architecture, this work focuses on enhancing forecasting performance in low-data settings by augmenting real data with synthetic time series. Synthetic data generation—commonly via GANs or VAEs—has grown across domains, but its impact on indoor temperature forecasting under data scarcity is underexplored. The research question is whether and how synthetic data, fused with real data or used to correct class imbalance, can improve forecasting accuracy and robustness in low-data environments. The paper reviews synthesizer methods, proposes fusion strategies, and empirically evaluates downstream forecasting utility.

Literature Review

Modern time series data augmentation methods fall into three categories: traditional transformations (e.g., scaling, rotation), GAN-based, and autoencoder-based approaches. Traditional methods are simple but can disrupt temporal relationships, prompting a shift toward generative models to preserve dynamics. Surveys report an overrepresentation of GANs, which tend to produce diverse data but face convergence issues in low-data regimes. Applications of synthetic time series span renewable energy forecasting, technology forecasting with GAN-augmented patent data, and traditional ML (e.g., k-means) for synthetic generation with mixed success when paired with deep forecasters. Based on this landscape, the study focuses on deep learning synthesis with GANs and VAEs, selecting three notable models: TimeGAN (adversarial + supervised losses with latent embedding/recovery networks), DoppelGANger (LSTM-based GAN with auxiliary discriminator for metadata, anti-mode-collapse strategies, and batched generation), and TimeVQVAE (VQ-VAE with discrete latent space, MaskGIT-style prior for sampling, and operating in time-frequency space with separate low/high-frequency pathways).

Methodology

Data acquisition and processing:

  • Dataset: Collected in a dedicated test facility ("Test-cell"). Tabular time series with N = 59,040 rows, D = 81 features, sampled at 1-minute intervals. A series is defined as 240 consecutive minutes (4 hours), shape (240, D). Analyses focus on the univariate channel B.RTD3 (room center temperature).
  • Acquisition phases (RICO): • RICO1 (Jul–Aug 2023): 17 days, 102 series (24,480 rows). Some inconsistencies due to sub-optimal tuning; most data behave normally. • RICO2 (Oct 2023): 10 days, 60 series (14,400 rows). Includes 1 hour free-fall after 3 hours constraint; excluded due to different format. • RICO3 (Jan 2024): 4 days, 24 series (5,760 rows) with 16h constrained + 4h free-fall; only first 4h used, totaling 6 useful series. • RICO4 (Feb 2024): 10 days, 60 series (14,400 rows); highest quality, fixed actuator tuning, no free-fall.
  • Feature categories: Identifiers (Phase, Step, Flag), Setpoints (EC3, SB43, B46, SB47) randomized every four hours, Features of Interest (e.g., internal air temperature), Environmental variables (weather metrics), Control features for integrity checks.
  • Inclusion/exclusion: Remove anomalous RICO1 series (19), exclude RICO2 (format mismatch), use 6 series from RICO3 (compatibility), exclude RICO4 series with missing values. Tag 1 for inclusion, 0 for exclusion.
  • Preprocessing: Standard scaling. Reshape to (N, L, C) with L=240 and C=1 (B.RTD3 only). Train/test split: 20% of series from each phase to test; result: train_real = 116 series; test_real = 31 series.

Data labeling for conditional synthesis:

  • Use first 3 hours of each series (exclude final stable hour). Apply 5-point moving average (edge-repetition padding). Compute local derivatives. Assign classes: Monotonic Positive (0), Monotonic Negative (1), Non-Monotonic (2).

Synthesizer training and comparison:

  • Models: TimeVQVAE (primary), DoppelGANger, TimeGAN (baseline).
  • Implementations: TimeVQVAE per Lee et al. (2023), DoppelGANger via gretel-ai, TimeGAN via official code (Yoon et al.).
  • Training data: 116 real series from all phases.
  • Hyperparameters: • TimeGAN: multiple settings tried; failed to converge satisfactorily. • DoppelGANger: sequence length 240, batch size 8, 1000 epochs. • TimeVQVAE: 2000 epochs for VQVAE; 10,000 epochs for prior learning (manual tuning found base params optimal).

Evaluation of synthesizers:

  • Qualitative: Visual inspection of generated samples for plausibility/diversity.
  • Traditional metrics: PCA and t-SNE comparisons of synthetic vs real distributions to assess coverage and structure (with t-SNE caveats about distance interpretability).
  • Utility metric (downstream forecasting): Train simple forecasters with/without synthetic augmentation and evaluate on real test data.

Forecasting model and utility experiments:

  • Forecaster: One-layer LSTM followed by a fully connected layer; manual hyperparameter tuning on the same training set. Predict next 30 minutes, with input sub-sampled by factor 10.
  • Experiment 1 (General augmentation): Train synthesizer on train_real. Sample 256 synthetic series (synth). Compare three strategies using 100 runs each: • TRTR: Train Real, Test Real (baseline). • TSTR: Train Synthetic (synth), Test Real. • TRSTR: Train Real + Synthetic, Test Real.
  • Experiment 2 (Class imbalance): Create imbalanced training sets by ablating class i with ratios r ∈ {0.25, 0.5, 0.75, 1.0}; train 12 synthesizers Σ^i (for each class and ratio). For each scenario, train baseline LSTMs on Set_r^i and augmented LSTMs on Set_r^i + conditionally generated synthetic samples to restore balance. Evaluate over 100 runs.

Metrics for forecasting evaluation:

  • MSE, MAE, MAPE, and MASE (with seasonality replaced by forecast horizon length due to lack of seasonality).
Key Findings

Synthesizer performance:

  • TimeGAN failed to converge satisfactorily and was excluded from downstream evaluations.
  • DoppelGANger generated plausible trends but introduced high-frequency noise-like artifacts and showed limited coverage/generalization in PCA/t-SNE space.
  • TimeVQVAE produced plausible and more diverse samples without the noisy artifacts and appeared to cover the data space more comprehensively in PCA; t-SNE showed extensions beyond original data boundaries (noting t-SNE interpretability limitations).

Experimental study 1 (General augmentation):

  • Training with synthetic data improved average forecasting performance vs baseline, with best results when combining real and synthetic data.
  • Aggregated means (lower is better): • TRTR: test_mse 0.003119; test_mase 2.390434; test_mae 0.037563; test_mape 0.251342. • TSTR: test_mse 0.001791; test_mase 2.027180; test_mae 0.030242; test_mape 0.199215. • TRSTR: test_mse 0.001714; test_mase 1.854287; test_mae 0.028266; test_mape 0.162576.
  • Aggregated standard deviations: • TRTR: test_mse 0.000579; test_mase 0.290714; test_mae 0.003708; test_mape 0.012571. • TSTR: test_mse 0.000570; test_mase 0.346175; test_mae 0.005047; test_mape 0.044246. • TRSTR: test_mse 0.000756; test_mase 0.392574; test_mae 0.005974; test_mape 0.032716.
  • Variance generally increased when adding synthetic data (TSTR, TRSTR), attributed to variability introduced by new generated samples.

Experimental study 2 (Class balancing):

  • No significant change in mean performance between baseline and augmented (balanced via synthetic) across ablation ratios; likelihood distributions overlapped.
  • Variance effects were metric- and ratio-dependent, ranging from ~0.14% decrease to ~55% increase.
  • Example results for Class 0 (means across 100 runs): • Baseline: r=0.25 MAE 0.04665, MAPE 0.31032, MASE 3.17415, MSE 0.00425; r=0.50 MAE 0.04671, MAPE 0.29151, MASE 3.18327, MSE 0.00418; r=0.75 MAE 0.03457, MAPE 0.23888, MASE 2.20503, MSE 0.00277; r=1.00 MAE 0.03140, MAPE 0.25104, MASE 2.10880, MSE 0.00217. • Test run (augmented): r=0.25 MAE 0.04677, MAPE 0.31002, MASE 3.17502, MSE 0.00438; r=0.50 MAE 0.04547, MAPE 0.28877, MASE 3.07901, MSE 0.00399; r=0.75 MAE 0.03447, MAPE 0.23835, MASE 2.19738, MSE 0.00277; r=1.00 MAE 0.03243, MAPE 0.25472, MASE 2.18728, MSE 0.00229.
  • Standard deviations for Class 0 (selected): Baseline vs Test run at r=0.75: MAE 0.00463 vs 0.00431; MAPE 0.02228 vs 0.02324; MASE 0.36545 vs 0.33682; MSE 0.00047 vs 0.00044.
Discussion

The findings support the hypothesis that synthetic data can enhance indoor temperature forecasting in low-data environments. TimeVQVAE outperformed GAN-based synthesizers in this setting, yielding more diverse and artifact-free samples that better cover the real data manifold. In downstream forecasting, models trained on synthetic data (TSTR) or combined real+synthetic data (TRSTR) achieved lower errors than the real-only baseline (TRTR), demonstrating utility of synthetic augmentation for simple LSTM forecasters and short-horizon predictions. However, augmentation increased training variance, likely due to variability across synthetic draws and possible distributional shifts.

For class imbalance, conditional synthetic oversampling did not meaningfully change mean accuracy across metrics and ablation ratios, though it modestly affected variance in mixed directions. This suggests that for the present dataset and label definitions, imbalance correction via synthetic generation may not be the limiting factor for forecasting performance; other aspects (feature richness, model capacity, or label fidelity) may dominate. The qualitative and embedding-space analyses (PCA/t-SNE) corroborate that TimeVQVAE offers better coverage and plausibility than DoppelGANger, though t-SNE’s distance interpretations are limited.

Overall, synthetic augmentation is a promising route to mitigate data scarcity in building temperature forecasting, but care is needed to manage increased variance and to ensure synthetic data reflect the operational domain, especially when training forecasters for deployment.

Conclusion

A VQVAE-based synthesizer (TimeVQVAE) demonstrated superior performance over investigated GAN-based methods (TimeGAN, DoppelGANger) for generating uni-variate time series in a low-data regime. Incorporating synthetic samples improved downstream forecasting accuracy, particularly when combining real and synthetic data, albeit with increased training variance that warrants further study. Using synthetic generation to balance class distributions neither improved nor degraded mean performance in the tested setup, possibly influenced by test set imbalance.

Future work should: (1) investigate sources of increased variance and approaches to stabilize training with synthetic data; (2) refine evaluation on balanced and representative test sets; (3) test across additional domains and multivariate settings to assess generality; (4) explore stronger forecasters and domain-informed conditioning; and (5) study robustness to distribution shift and extreme scenarios not present in standard operation.

Limitations
  • Low-data setting from a single test facility; limited temporal coverage and diversity due to cost constraints and operational schedules.
  • Only one univariate channel (B.RTD3) used for synthesis/forecasting; multivariate dependencies not exploited.
  • TimeGAN failed to converge, limiting breadth of GAN comparisons.
  • Simple forecasting architecture (one-layer LSTM) may cap achievable performance and interaction with augmentation strategies.
  • Increased variance with synthetic augmentation; underlying causes not fully explained.
  • Class imbalance experiments showed small, metric-dependent variance changes and no mean gains; conclusions may be affected by imbalance in the test set.
  • PCA/t-SNE provide limited quantitative fidelity assessments; t-SNE distances are not directly interpretable.
  • RICO2 excluded due to protocol mismatch; RICO3 heavily subsampled to first 4h; potential selection bias.
  • Results specific to short-horizon forecasting (30 minutes) and may not generalize to longer horizons.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny