Engineering and Technology
Introducing edge intelligence to smart meters via federated split learning
Y. Li, D. Qin, et al.
Electric power systems contribute over 40% of global CO2 emissions, and integrating high shares of renewables requires harnessing demand-side flexibility enabled by smart meters. Despite their ubiquity and the massive data they collect, current smart meters lack on-device intelligence due to strict hardware constraints (limited memory/compute/communication) and privacy concerns around centralized data use. This creates barriers to utilizing distributed big data from meters and to training complex models locally. The study addresses two key questions: how to efficiently utilize distributed smart meter data while preserving privacy, and how to train accurate models on resource-constrained meters. The authors propose an end-edge-cloud framework combining federated learning (to exploit distributed data with privacy enhancements) and split learning (to offload heavy computation to edge/cloud) to enable on-device load forecasting with high efficiency in memory, computation, and communication.
Edge intelligence seeks to push AI toward end devices while leveraging edge/cloud resources. Computation offloading mitigates device constraints; split learning allows a deep network to be partitioned across entities, easing on-device memory and compute loads. Federated learning enables collaborative model training without sharing raw data and has been successful in healthcare, finance, and energy. However, FL entails significant communication overhead (frequent exchange of parameters/gradients) and, in synchronous aggregation, delays due to straggler devices; asynchronous aggregation can speed training but may harm convergence accuracy. Prior edge intelligence studies in energy often remain at simulation level and do not realize methods on constrained smart meter hardware. There is a gap for a unified framework that jointly considers accuracy, on-device memory footprint, training speed, and communication overhead for smart meter intelligence.
Framework: A three-tier end-edge-cloud design where smart meters (end), edge servers, and a cloud server collaboratively train forecasting models in a privacy-enhancing manner.
- Model splitting: The cloud selects an efficiency-optimal split ratio to minimize training time subject to smart meter memory constraints. The model is split into: feature extractor (end/smart meter), feature processor (edge server), and regressor (end). Input/output layers remain on the meter to protect raw data. An analysis derives the peak memory footprint as a function of split ratio and provides bounds ensuring SRAM constraints (192 KB) are respected. A theorem provides the efficiency-optimal split ratio given device compute powers and communication rates.
- Collaborative training with parallelism and knowledge distillation: An auxiliary lightweight regressor is added on the meter to enable parallel backpropagation: the main model (feature extractor + edge feature processor + main regressor) and the auxiliary model (feature extractor + auxiliary regressor) backpropagate in parallel. This removes the need to transmit the split-layer gradient back from the edge, cutting communication and backprop time. Knowledge distillation couples the auxiliary and main objectives by adding a distillation term aligning the auxiliary prediction with the main model’s output, stabilizing and improving convergence accuracy without extra memory.
- Semi-asynchronous hierarchical aggregation: To train global models over distributed meters, a two-stage aggregation is used. End-edge synchronous aggregation: meters are clustered by hardware/communication characteristics (e.g., compute power P_m, link rate R) using balanced k-means; each edge synchronously aggregates its cluster to mitigate straggler effects within clusters. Edge-cloud asynchronous aggregation: the cloud updates the global model as edge-aggregated models arrive, with weighted updates. This combines the stability of synchronous aggregation with the speed of asynchronous updates and reduces overall delay. Hardware platform and experimental setup: Real hardware instantiation includes one tower server (cloud), three PCs (edges), and 30 ARM Cortex-M4 MCUs representing smart meters (192 KB SRAM, 1 MB FLASH, up to 168 MHz; RS485 at 115.2 kbps). Two datasets are used: BDG2 (hourly building loads across North America/Europe, 2016–2017) and CBTs (Irish residential, 30-min data aggregated to hourly, 2009–2010). For each MCU, data for 30 randomly selected buildings/homes are preloaded; first year for training and following half-year for testing. Features include historical load and calendar variables only. Baseline model is MLP; small (single hidden layer) and large (multiple hidden layers) variants are considered. Metrics: MAE, RMSE, MAPE (SMAPE for CBTs instead of MAPE), plus peak memory footprint, training time, and per-round communication overhead. Theoretical convergence analyses under standard smoothness and bounded gradient assumptions are provided for both auxiliary and main models.
- Hardware feasibility: Successfully trains complex models on real smart meter-class MCUs with only 192 KB SRAM by splitting and offloading computation to edges.
- Efficiency gains: Relative to distributed baselines, memory footprint reduced by 95.5%, training time by 94.8%, and communication burden by 50%; tabled results show 22.4× memory saving, 19.23× training time reduction, and 2.01× lower communication overhead per round vs high-capacity server baselines when considering on-device training.
- Accuracy: Comparable or superior forecasting accuracy to centralized/server-trained methods; consistently outperforms device-friendly baselines (Local, FedAvg/Prox small, Split, SFLV1, SFLV2). Example (BDG2, MAE): improvements over Split 5.62%, over FedAvg-S 7.27%, over FedProx-S 9.46%, over Local-S 11.07%.
- Optimal splitting: Efficiency-optimal split selection minimizes training time; up to 2.97× faster training compared to suboptimal split choices, without affecting accuracy.
- Parallelism and KD ablation: Parallel training reduces training time up to 1.55× and communication per round by 1.3×; adding knowledge distillation recovers and surpasses the non-parallel accuracy at no extra memory.
- Semi-asynchronous aggregation: In heterogeneous meters, the two-stage approach cuts total training time by 3.11× and communication by 2.0× with minimal accuracy loss vs synchronous; better accuracy than purely asynchronous.
- Generality: Outperforms device-friendly benchmarks across horizons (12 h, 24 h) with RMSE/MAPE/MAE improvements up to 1.33%, 2.19%, 3.27% respectively; works with multiple backbones (MLP, CNN, RNN, GRU, LSTM), with MLP often strongest under tight memory.
- Downstream impact: Edge intelligence with improved forecasts reduces electricity cost (buildings: −31.79%; houses: −35.42%), increases renewable energy accommodation (buildings: +35.38%; houses: +40.38%), and lowers carbon emissions (buildings: −59.78%; houses: −49.31%). The proposed method further improves over the best competing intelligent baselines by up to a few percent in these metrics. Annual cost savings estimated at $1,176.11 per building and $18.93 per household.
The proposed end-edge-cloud federated split learning framework directly addresses the core challenges limiting smart meter intelligence: strict device memory/compute/communication constraints and privacy-preserving use of distributed data. By optimally splitting models, parallelizing end/edge training with an auxiliary head, and adding knowledge distillation, the approach achieves high accuracy without exceeding on-device memory budgets. The hierarchical semi-asynchronous aggregation balances training speed and convergence quality under device heterogeneity, alleviating straggler effects and communication bottlenecks. Empirical results on a hardware testbed confirm that the method yields substantial efficiency gains while maintaining or improving forecasting accuracy versus centralized and federated baselines. These forecasting gains translate into tangible benefits for energy management (lower costs, higher renewable accommodation, reduced emissions). The approach is model-agnostic and robust across forecasting ranges and neural backbones, indicating broad applicability for smart grids. Overall, the findings demonstrate a practical path to deploy edge intelligence at scale on existing smart meters without additional hardware investment, enabling more responsive, privacy-enhancing, and efficient demand-side analytics.
This work introduces a unified, practical end-edge-cloud framework that integrates federated and split learning to enable on-device intelligence on resource-constrained smart meters. Key contributions include: (i) an efficiency-optimal model splitting strategy that respects memory limits and minimizes training time; (ii) a collaborative training pipeline with parallelism and knowledge distillation that reduces communication and accelerates convergence while preserving accuracy; (iii) a two-stage semi-asynchronous aggregation scheme that manages device heterogeneity and reduces delay; (iv) a real hardware validation on Cortex-M4-based meters showing large reductions in memory footprint, training time, and communication, with accuracy comparable or superior to server-based methods; and (v) demonstrated improvements in downstream energy management. Future directions include active device selection for robustness to data/communication quality, online/continual learning mechanisms for frequent updates, and personalization to handle data and device heterogeneity. Broader applications include on-device monitoring and control tasks for both consumers and DSOs.
- Devices with poor-quality data or unstable communication can degrade accuracy and training efficiency; active device selection is needed.
- Frequent data arrival necessitates online/continual updates to ensure timely model refreshes under resource constraints.
- Heterogeneity in data and devices limits a single global model’s suitability; personalization and automated model design are required.
- Current clustering for aggregation is hardware-aware but does not consider geographic/physical network topology of meters.
- Hardware validation used one representative smart meter configuration; broader compatibility across diverse meter cores and communication technologies remains to be demonstrated.
Related Publications
Explore these studies to deepen your understanding of the subject.

