logo
ResearchBunny Logo
Balancing Energy Efficiency and Distributional Robustness in Over-the-Air Federated Learning

Engineering and Technology

Balancing Energy Efficiency and Distributional Robustness in Over-the-Air Federated Learning

M. Badi, C. B. Issaid, et al.

Discover a groundbreaking method for enhancing energy efficiency in federated learning through innovative over-the-air computation, presented by Mohamed Badi, Chaouki Ben Issaid, Anis Elgabli, and Mehdi Bennis. This research tackles critical challenges in distributed learning, achieving remarkable energy savings while maintaining robustness against data variability.... show more
Introduction

The paper addresses the problem of training federated learning (FL) models over heterogeneous client data under strict energy, bandwidth, and latency constraints in wireless systems. Conventional FL may suffer from fairness issues when data distributions differ across clients, while wireless communication adds scalability and energy challenges. The authors build on distributionally robust optimization (DRO) for FL (agnostic federated learning, AFL) to ensure robustness to data heterogeneity and combine it with over-the-air computation (AirComp) to improve communication efficiency. The key research question is how to select clients in each communication round to balance energy efficiency (by exploiting channel conditions) and distributional robustness (by focusing on worst-case clients), within an over-the-air aggregation framework. Contributions: (1) first joint treatment of energy efficiency and robustness to data heterogeneity in communication cost; (2) a tunable client selection mechanism that interpolates between non-channel-aware robust AFL and a greedy energy-minimizing scheduler; (3) empirical validation showing substantial energy savings with negligible robustness degradation compared to robust baselines and improved fairness versus energy-centric baselines.

Literature Review

Federated learning (FedAvg) enables on-device training with model aggregation, reducing raw data transfer (McMahan et al., 2017). Robustness to data heterogeneity is addressed by agnostic federated learning (AFL), a min-max DRO formulation with a descent-ascent algorithm selecting client subsets each round (Mohri et al., 2019). Extensions consider distributionally robust federated averaging and decentralized robust learning with communication efficiency (Deng et al., 2020; Zecchin et al., 2022; Issaid et al., 2022). For wireless FL, over-the-air computation (AirComp) leverages waveform superposition to achieve scalable aggregation and lower latency, with demonstrated benefits over digital schemes (Amiri & Gündüz, 2019; IEEE TSP 2020; Elgabli et al., 2021). Dynamic client scheduling that mixes gradient norms and channel conditions (Du et al., 2023, GCA) offers energy and convergence gains but relies on heuristic indicators and tuning, with unpredictable scheduled set sizes. The present work integrates DRO-based robustness with AirComp and a principled, tunable, channel-aware client selection to balance energy efficiency and robustness.

Methodology

System model: A parameter server (PS) coordinates N edge devices using AirComp over OFDM. In each communication round t, only K clients are scheduled to transmit analog model updates over the uplink, exploiting channel superposition for over-the-air aggregation; the aggregated signal per subcarrier is a weighted sum of client transmissions plus noise. After aggregation, the PS broadcasts the updated global model to all clients over a common downlink. A bidirectional control channel is used for lightweight control information (e.g., probability updates). The learning objective follows a distributionally robust (min-max) formulation over a probability simplex on client weights, as in AFL, to handle data heterogeneity. Energy model: Uplink transmission employs channel inversion; transmission energy per client is driven by the power needed for inversion (separate from symbol power which reflects learning dynamics). The per-round energy E^(t) is the sum over scheduled clients. To emphasize scheduling decisions, the focus is on the component of energy attributable to channel inversion (i.e., favoring clients with stronger effective channels). Channel assumptions include i.i.d. truncated Rayleigh block fading and uplink-downlink symmetry. Client selection design: The core idea is to combine two probability mass functions (PMFs): (i) an energy-conservative PMF y_i^(t) that biases sampling toward clients with better effective channels; and (ii) the robustness PMF λ_i^(t) from the AFL descent-ascent algorithm. The energy PMF is y_i^(t) proportional to (1/h_i^(t))^(1+C), where h_i^(t) is the effective channel and C ≥ 0 is a tunable energy-conservation factor. As C increases, y becomes more biased, prioritizing clients with strong channels (low inversion energy). The final sampling probabilities are p_i^(t) proportional to y_i^(t) λ_i^(t), normalized across clients (a product-of-experts style combination). Limiting behaviors: • As C → 0, y_i is unbiased and p_i reduces to AFL’s λ_i, i.e., non-channel-aware robust client selection. • As C → ∞, selection converges to greedy top-K clients with the best channels (lowest energy) each round. Algorithm (CA-AFL): In each round t, the PS computes p^(t) from current channels and λ^(t), samples K clients for the descent step according to p^(t), and these clients run local updates (mini-batch size ζ) and transmit analog model updates via AirComp. The PS aggregates and broadcasts the global model. For the ascent step, the PS uniformly samples K clients to estimate gradients of the DRO objective (mini-batch size ξ) and updates the simplex weights λ^(t). The λ updates involve small control overhead and are transmitted over the control channel. The scheme thus blends robustness-driven and energy-driven sampling through C, providing a continuous trade-off between fairness/robustness and energy efficiency.

Key Findings

Experimental setup: Fashion-MNIST with a logistic regression model (M = 7,850 parameters). N = 100 clients; K = 40 per round; T = 500 rounds; batch size 50. Descent learning rate decays from 0.1 by 0.998 per iteration; ascent step size 8e-3. Channels are i.i.d. truncated Rayleigh (h ≥ 0.05), no power control (favoring non-channel-aware AFL), coherence time of one round, scaling factor 0.5 mW, symbol period 1 ms. Baselines: FedAvg, AFL (robust, non-channel-aware), and GCA (gradient- and channel-aware dynamic scheduling with tuned hyperparameters, averaging 42 scheduled clients). Results: • Average accuracy: All methods reach about 80% average test accuracy; CA-AFL with higher C achieves comparable accuracy at substantially lower energy than AFL and FedAvg. • Worst-client accuracy: CA-AFL nearly matches AFL with negligible degradation for C = 8, and achieves about 10% higher worst-client accuracy than FedAvg and GCA. It reaches 50% worst accuracy in less than half the rounds compared to FedAvg and GCA. • Fairness (STD of global accuracy): CA-AFL attains lower STD than GCA and FedAvg, approaching AFL’s STD for small C (e.g., C = 2) and remaining competitive for C = 8, achieving the same STD as GCA with less energy. • Energy efficiency: FedAvg and AFL are most energy-intensive due to lacking channel awareness. Increasing C in CA-AFL yields large energy savings without sacrificing accuracy. At C = 8, CA-AFL matches the energy efficiency of GCA while delivering superior worst-client accuracy and fairness (lower STD). The method can match AFL’s performance while consuming approximately one-third the energy (i.e., >3× energy savings). Overall, CA-AFL achieves significant energy reductions with minimal or no loss in robustness metrics versus AFL and outperforms GCA in worst-client accuracy and fairness at similar energy budgets.

Discussion

The results show that combining an energy-aware PMF with the DRO-based robustness PMF via a product-of-experts formulation effectively balances competing objectives. By tuning C, the method transitions smoothly from robust, non-channel-aware AFL to a greedy energy-minimizing scheduler, allowing practitioners to target desired points on the energy–robustness trade-off. Empirically, CA-AFL preserves average accuracy and closely matches worst-client accuracy of robust AFL while substantially reducing energy, indicating that channel-aware sampling can be integrated without undermining distributional robustness. Compared to a heuristic energy-centric scheduler (GCA), CA-AFL attains better fairness (lower STD) and worst-client performance at similar or lower energy, highlighting the benefit of principled integration of robustness and channel information. These findings directly address the research objective of enabling distributionally robust FL over wireless with strong energy efficiency under AirComp aggregation.

Conclusion

The paper proposes CA-AFL, a channel-aware agnostic federated learning algorithm that integrates an energy-conservative client selection PMF with the AFL robustness PMF through a tunable product-of-experts scheme. The tuning factor C yields a continuum between robust but non-channel-aware AFL and a fully energy-greedy top-K scheduler. Simulations on Fashion-MNIST demonstrate that CA-AFL achieves substantial energy savings—down to roughly one-third of AFL’s energy—while maintaining comparable average and worst-client accuracy and improving fairness relative to energy-centric baselines. The work shows that energy efficiency and distributional robustness can be jointly addressed in over-the-air federated learning via a lightweight, configurable sampling mechanism.

Limitations

• The evaluation is simulation-based on a single dataset (Fashion-MNIST) and a simple logistic regression model; generalization to deeper models and diverse datasets is not demonstrated. • The approach focuses on the energy component due to channel inversion (symbol power effects are not considered in scheduling), which may not capture all practical energy costs. • Increasing the energy bias (larger C) degrades worst-client accuracy and fairness relative to robust AFL, indicating a trade-off that must be tuned per scenario. • Channel assumptions (i.i.d. truncated Rayleigh, coherence for one round, uplink–downlink symmetry) and absence of power control are idealized and might limit applicability in more complex wireless settings.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny