Earth Sciences

Accurate medium-range global weather forecasting with 3D neural networks

K. Bi, L. Xie, et al.

Discover Pangu-Weather, an innovative AI-driven approach to global weather forecasting that surpasses the European Centre for Medium-Range Weather Forecasts. Developed by Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian from Huawei Cloud, this method excels in extreme weather prediction and offers significantly faster computation times.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses whether AI-based approaches can achieve or surpass the accuracy of leading numerical weather prediction (NWP) systems for medium-range global forecasting while offering dramatic speedups. NWP solves discretized partial differential equations for atmospheric dynamics but is computationally expensive and relies on parameterizations that introduce errors. Recent deep learning models can forecast much faster on GPUs, yet their accuracy has lagged behind operational NWP. The authors propose that incorporating vertical structure explicitly via 3D neural networks and mitigating error accumulation over longer lead times can enable AI systems to close and surpass the accuracy gap with NWP on reanalysis-based deterministic forecasts, with potential benefits for extreme weather prediction and ensembles.

Literature Review

The paper situates its contribution within prior work on NWP systems (e.g., ECMWF IFS and ensemble systems) and the growing body of AI-based forecasting, including FourCastNet and other deep learning approaches for medium-range forecasting and benchmarking (e.g., WeatherBench). Prior reports indicated that while AI offered substantial speed advantages, accuracy remained below operational NWP, with surveys calling for fundamental breakthroughs before AI could outperform NWP. The authors build on vision transformer architectures (e.g., Swin Transformer) and prior ensemble perturbation strategies, and evaluate on ERA5 reanalysis, a standard dataset for atmospheric modeling assessment.

Methodology

The authors develop Pangu-Weather, an AI-based system using three-dimensional Earth-specific transformer (3DEST) networks and a hierarchical temporal aggregation strategy. Data: ERA5 reanalysis with 1-hour resolution from 1979–2017 for training (341,880 time points), 2019 for validation, and 2018 for testing. Variables: 69 factors comprising five upper-air variables across 13 pressure levels (50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1,000 hPa) plus four surface variables at 0.25° × 0.25° spatial resolution. Networks and training: Four separate models with lead times 1 h (FM1), 3 h (FM3), 6 h (FM6), and 24 h (FM24). Each trained for 100 epochs, each ~16 days on a cluster of 192 NVIDIA Tesla V100 GPUs. To reduce overfitting, training samples were randomly permuted each epoch. Architecture: Inputs (13-layer upper-air and surface variables) undergo patch embedding and are combined into a 3D cube with explicit vertical (pressure level) dimension. A 16-block encoder–decoder derived from Swin Transformer processes the 3D data. An Earth-specific positional bias replaces relative positional bias to encode geophysical priors; this increases bias parameters by 527× with a total of ~64M parameters per model, without increasing computational cost and with faster convergence. Outputs are split into upper-air and surface variables with patch recovery to original resolution. Inference strategy: For medium-range forecasting (≥7 days), iterative application of base models would accumulate errors. The proposed hierarchical temporal aggregation greedily uses the largest feasible lead-time model to reach a target lead time, reducing the number of iterative steps and cumulative error (e.g., 56 h achieved by 24 h ×2, 6 h ×1, 1 h ×2). Computational performance: Inference time is ~1.4 s on a single GPU for a deterministic forecast. Ensemble method: A 100-member ensemble is generated by adding 99 random perturbations to the initial state (details in Methods) and averaging forecasts; performance is evaluated via RMSE, CRPS, and spread-skill ratio.

Key Findings

- Deterministic accuracy on ERA5 (2018 test year): Pangu-Weather outperforms ECMWF operational IFS and FourCastNet across all tested upper-air and surface variables and lead times from 1 h to 168 h (7 days). Typical RMSE reductions are ~10% versus IFS and ~30% versus FourCastNet. Example: 5-day Z500 RMSE is 296.7 for Pangu-Weather vs 333.7 (IFS) and 462.5 (FourCastNet). - Forecast time gain: Pangu-Weather achieves the same accuracy as competitors at longer lead times—typically 10–15 h versus IFS, >24 h for some variables (e.g., specific humidity), and up to ~40 h versus FourCastNet. - Speed: Single-member inference is ~1.4 s on one GPU, more than 10,000× faster than operational IFS and comparable to FourCastNet. - Visual quality: Pangu-Weather forecasts exhibit smoother fields than IFS due to regression tendencies, while remaining close to ERA5 ground truth. - Tropical cyclone tracking (2018, 88 named storms): Using MSLP minima from reanalysis-initialized runs, Pangu-Weather shows lower mean direct position errors than ECMWF-HRES, with 3-day error 120.29 km vs 162.28 km and 5-day error 195.65 km vs 272.10 km; advantages grow with lead time. Case studies (Kong-rey, Yutu) show earlier correct track prediction than ECMWF-HRES. - Ensembles (100 members via initial-state perturbations): Ensemble mean is slightly worse than single-member for short ranges (~1 day) but significantly better at 5–7 days, particularly for non-smooth variables like Q500 and U10. CRPS improves at longer lead times; spread-skill ratio < 1 indicates underdispersion in the current ensemble approach.

Discussion

The results demonstrate that explicitly modeling the vertical dimension with 3D Earth-specific transformers and reducing iterative steps via hierarchical temporal aggregation enables an AI system to surpass a leading NWP system on reanalysis-based medium-range deterministic forecasts across diverse variables. The improved accuracy, combined with orders-of-magnitude speedups, allows practical large-member ensemble forecasting and improved tracking of extreme events such as tropical cyclones. The smoother character of AI forecasts reflects regression properties and can differ from NWP fields derived from PDE solutions subject to initial condition and sub-grid uncertainties. The observed forecast time gains across variables, especially humidity, suggest AI’s strength in learning from large datasets where NWP parameterizations struggle. Ensemble results indicate that AI ensembles can enhance medium-range skill economically, though calibration and dispersion remain areas for improvement. Overall, findings support AI as a powerful complement or surrogate to NWP under reanalysis initialization, with potential to transform operational workflows when adapted to observational inputs and integrated with NWP.

Conclusion

Pangu-Weather introduces two key advances—3D Earth-specific transformer architectures and hierarchical temporal aggregation—that jointly deliver state-of-the-art deterministic medium-range forecast accuracy on ERA5 reanalysis while being >10,000× faster than operational IFS. The system also shows strong performance on extreme weather tracking and enables efficient ensemble forecasting. Future work should adapt the approach to observational initial conditions, incorporate additional variables and vertical levels (e.g., precipitation), extend to 4D spatiotemporal networks, scale models and training durations, improve ensemble dispersion and calibration, and explore hybrid AI–NWP integration for even stronger performance.

Limitations

- Data domain mismatch: Models are trained and evaluated on reanalysis data, whereas operational systems use observational data; performance under observational initialization requires investigation. - Variable coverage: Some key variables (e.g., precipitation) are not modeled, potentially limiting capability for small-scale extreme events (e.g., tornado outbreaks). - Smoothing/underestimation risk: AI regression tendencies can smooth fields and may underestimate extremes’ magnitudes. - Temporal consistency: Using different lead-time models in hierarchical aggregation can introduce temporal inconsistencies. - Cyclone tracking comparison fairness: ECMWF-HRES uses IFS initial conditions, while Pangu-Weather uses reanalysis, complicating direct fairness of comparisons. - Ensemble underdispersion: Current perturbation-based ensemble exhibits spread-skill ratio < 1, indicating underdispersion and the need for improved ensemble design and calibration.

Related Publications

Explore these studies to deepen your understanding of the subject.

Earth Sciences

Early forecasting of tsunami inundation from tsunami and geodetic observation data with convolutional neural networks

F. Makinoshima, Y. Oishi, et al.

Computer Science

Detection of eye contact with deep neural networks is as accurate as human experts

E. Chong, E. Clark-whitney, et al.

Computer Science

High-performance deep spiking neural networks with 0.3 spikes per neuron

A. Stanojevic, S. Woźniak, et al.

Computer Science

Accurate global and local 3D alignment of cryo-EM density maps using local spatial structural features

B. He, F. Zhang, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny