Earth Sciences
A low-cost post-processing technique improves weather forecasts around the world
T. D. Hewson and F. M. Pillosu
The study addresses a core limitation of global numerical weather prediction (NWP): forecasts are issued at coarse gridbox scales (about 20 km by 20 km), whereas users need site-specific (point) forecasts. Sub-grid variability in rainfall, especially during convective situations, causes point forecasts derived directly from gridbox means to fail. The research question is whether a low-cost, globally applicable statistical post-processing (PP) approach can transform ensemble gridbox forecasts into reliable, discriminative probabilistic point forecasts by accounting for weather-dependent sub-grid variability and grid-scale bias. The paper introduces ecPoint, a non-local, weather-type-based PP method designed to improve point rainfall forecasts worldwide, including extremes relevant for flash flooding.
Common strategies to address sub-grid variability include using higher-resolution convection-permitting models (~2 km) and statistical post-processing (e.g., MOS and modern ensemble PP). High-resolution models improve realism and skill but have limited spatial coverage due to computational cost. Classical PP methods often require multi-decade observations and reforecasts, assume parametric distributions with tail limitations, may not separate convective versus large-scale precipitation, and can lack global applicability. The paper references quantile regression forests and other machine-learning PP methods, noting challenges such as stationarity, training data scarcity, representativeness, and difficulty improving extremes. A prior global PP approach improved either reliability or resolution depending on threshold but assumed a single weather type and lacked verification for very large totals. The authors position ecPoint as addressing these issues via multiple gridbox-weather-types, nonparametric calibration, global non-local training, and explicit treatment of convective versus large-scale precipitation (Table 1).
Overview: The ecPoint methodology statistically converts each ensemble gridbox rainfall forecast into a probabilistic point forecast using a set of gridbox-weather-types and associated nonparametric mapping functions. It simultaneously corrects grid-scale bias and expands or contracts distribution tails according to the diagnosed weather type. The approach is non-local: calibration aggregates forecast–observation relationships globally whenever governing-variable conditions (weather type) are similar.
Gridbox-weather-types and governing variables: Weather types are defined by ranges of governing variables that physically relate to sub-grid rainfall variability and bias, for example: fraction of convective precipitation (P_conv), mid-tropospheric wind speed (e.g., 700 hPa wind, V700), CAPE, forecast total rainfall amount, and other classes of variables (raw model, computed, geographical, astronomical, including local solar time for diurnal cycle effects). The current operational system uses 214 types organized in a decision-tree structure.
Calibration data: Calibration uses one year of global rain gauge point observations (12 h accumulations over land), paired with short-range unperturbed Control run forecasts as the gridbox predictor. This short period avoids climatological non-stationarity and drastically reduces reforecast requirements.
Forecast error ratio and mapping functions: For each observation–forecast pair where the gridbox forecast exceeds a minimal threshold (≥1 mm for stability), a non-dimensional forecast error ratio (FER) is computed to relate point rainfall to the gridbox forecast. Negative FER indicates over-prediction at the point; positive indicates under-prediction. Each pair is assigned to a gridbox-weather-type using the governing variables. Aggregating FER values over all sites and times for a given type yields a nonparametric FER probability density (a mapping function) representing the distribution of point outcomes within a gridbox for that type. Mapping functions also implicitly encode a grid-scale bias-correction factor C via the expected FER.
Bias correction: For each type j, the bias-correction factor C_j equals the ratio of mean observed to mean forecast rainfall (the expected value derived from the FER distribution). This corrects systematic model over- or under-prediction at gridbox scale and is informative about model physics errors (e.g., overprediction in certain orographically triggered convection regimes).
Forecast production (ensemble of ensembles): For each ensemble member and each gridbox/time, the governing variables determine the weather type and thus the mapping function. The member’s gridbox rainfall is first bias-corrected, then transformed into a probabilistic point rainfall distribution using the mapping function. Repeating this for all members yields an ensemble of probabilistic realisations (ensemble of ensembles). Operationally, each mapping function is represented by 100 outcomes; with 51 ensemble members this yields 5100 realisations per gridbox/period, distilled into percentiles (1–99%) for products. Outputs include: weather type identifier, bias-corrected gridbox forecast, median of member-wise high percentiles (e.g., median 95th), and the full point rainfall percentile fields.
Verification strategy: One year of retrospective ecPoint and raw ensemble forecasts were verified against global SYNOP and high-density national gauge datasets, using categorical metrics: Brier Score reliability component and area under the ROC curve (ROCA), with a climatology-based baseline to assess zero skill. Thresholds assessed were ≥0.2, ≥10, and ≥50 mm per 12 h. Verification and calibration periods were separate; tropical and extratropical subsets were also analysed.
- Global skill improvement: Across almost all lead times and thresholds (≥0.2, ≥10, ≥50 mm/12 h), ecPoint substantially improves reliability and discrimination versus the raw ensemble.
- Lead-time gains (ROCA-based): Approximately 1 day (≥0.2 mm), 2 days (≥10 mm), and 8 days (≥50 mm), centred around day 5. For extreme totals (≥50 mm/12 h), ecPoint retains useful discrimination to at least day 5, while the raw ensemble has limited utility even on day 1.
- Reliability: Marked improvements for the dry/not-dry threshold (≥0.2 mm/12 h), linked to weather-type logic that can assign near-zero point rain probabilities in convective cases despite nonzero gridbox means.
- Tail handling: Added value stems from better treatment of wet-tail percentiles (e.g., 98th/99th), extending the tail in convective regimes and improving extreme rainfall probabilities relevant for flash flood risk.
- Regional performance: Absolute ROCA increases and lead-time gains are larger in the tropics (more frequent convection), with extratropics only ~25% lower gains.
- Case studies: For the 25 Feb 2019 Crete floods (>75 mm/24 h widespread, local up to 373 mm/24 h), ecPoint fields were smoother, less jumpy across runs, higher probability for large thresholds, and better spatial focus than the raw ensemble, supporting improved warnings. A Norway winter cyclonic-convective case showed ecPoint reducing the upper tail due to a large diagnosed over-forecast bias, aligning better with gauges and radar.
- Addressing known PP challenges: Table 1 highlights that ecPoint achieves global coverage; needs only ~1 year of calibration observations and control reforecasts; uses nonparametric distributions with unconstrained tails; improves extremes; is robust to occasional bad data; and provides physically interpretable mapping functions that expose model biases.
- Operational feasibility: The approach is computationally cheap relative to high-resolution global NWP and has been delivered experimentally in real time since April 2019.
The findings demonstrate that a weather-type-based, non-local calibration framework can translate coarse-grid ensemble rainfall forecasts into accurate, reliable, and discriminative probabilistic point forecasts globally. By conditioning on physically meaningful governing variables (e.g., convective fraction, wind speed, CAPE, forecast totals), ecPoint captures sub-grid variability structures and corrects grid-scale biases in a manner that generalizes across regions and seasons. The large lead-time gains, especially for extremes, directly address the sub-grid mismatch problem and significantly extend the range of actionable flash flood risk forecasts. Compared with existing PP methods, ecPoint’s multiple weather types enable better reliability and resolution simultaneously, while preserving interpretability that informs model development (e.g., identification of overprediction in specific topographic–convective regimes). The improved performance in convective contexts targets a primary global NWP weakness, and smoother, less jumpy probability fields enhance forecaster usability.
This paper introduces ecPoint, a low-cost, globally applicable statistical post-processing method that converts ensemble gridbox rainfall forecasts into probabilistic point forecasts using weather-type-conditioned, nonparametric mapping functions with embedded bias correction. Verification shows substantial gains in reliability and discrimination across thresholds and lead times, including strong improvements for extreme rainfall, extending useful predictability to around day 5. The approach addresses longstanding PP challenges (training data, stationarity, tail behavior, extremes, global coverage) while providing physically interpretable insights into NWP model biases. Future directions include: expanding governing variables and optimizing the decision tree with tailored machine learning while preserving interpretability; incorporating more high-density and crowdsourced observations and shorter accumulation periods (<6 h); blending with convection-resolving limited-area ensemble PP for seamless scales; producing global pointwise re-analyses from ERA5 to 1950 and developing climatologies for extremeness indices; and applying ecPoint concepts to other weather variables and to climate downscaling applications.
- Training data extremes: Rare global record-setting conditions may be underrepresented in a one-year calibration, potentially limiting performance in unprecedented events, though mapping functions can still extrapolate better than some parametric methods.
- Local idiosyncrasies: At specific sites with unusual topography or microclimates, local MOS-type approaches may outperform ecPoint, and regional performance variations are expected.
- Diurnal cycle and timing: Remaining systematic errors in the modeled diurnal cycle of convection and observation density variations induce UTC-dependent verification oscillations; ongoing work includes 6 h accumulations and local solar time as a governing variable.
- Orographic effects: Some orographic enhancement deficits in the raw model may not be fully corrected by ecPoint in certain cases.
- Data and metrics: Gauge data have known issues (e.g., undercatch). Verification against point observations inherently limits achievable discrimination for grid-based forecasts due to sub-grid variability. Day-1 reliability differences may reflect ensemble perturbation design and model spin-up effects.
- Calibration thresholds: Stability constraints (e.g., excluding very small gridbox totals) and using the 99th percentile for verification may truncate information in the extreme tail relevant to very low-probability events.
Related Publications
Explore these studies to deepen your understanding of the subject.

