logo
ResearchBunny Logo
A large ensemble illustration of how record-shattering heat records can endure

Environmental Studies and Forestry

A large ensemble illustration of how record-shattering heat records can endure

J. S. Risbey, D. B. Irving, et al.

This innovative study by James S Risbey and colleagues dives into record-breaking temperature extremes, revealing the surprising rarity of such days and the essential role of extensive climate models in understanding these phenomena. Join us as we explore the intricate connections between chance weather patterns and extreme heat events!... show more
Introduction

The study examines why record‑shattering heat records at fixed locations can endure for decades without being surpassed, even amid ongoing climate warming. Using the June 2021 Pacific Northwest heatwave as motivation and case context (SeaTac Airport, USA), the authors focus on local (station‑scale) daily maximum temperature extremes (TXx) that are outliers far beyond prior records. They highlight challenges with interpreting such extremes: observational records are short and nonstationary; extreme value fits are sensitive to sample size and influential points; and climate models may not reproduce such outliers at fixed locations due to limited sampling rather than process failures. The purpose is to use a very large hindcast ensemble to illustrate how chance alignment of weather patterns and antecedent dryness governs the rarity and persistence of record‑shattering extremes, to quantify sampling requirements (return periods and sample sizes), and to clarify implications for interpreting observations and climate projections.

Literature Review

The paper situates record‑shattering extremes within prior work on climate extremes, heatwaves, and extreme value theory. It notes that for aggregated variables (e.g., global monthly means), trends dominate variability and records are often attributable to climate change (Rahmstorf & Coumou, 2011), while for local daily extremes variability can dominate, muting changes in record frequency (Redner & Petersen, 2006; Rahmstorf & Coumou, 2011). Previous studies show the 2021 PNW heatwave was an extreme outlier and will become more common in a warmer world (McKinnon & Simpson, 2022; Thompson et al., 2022), and that very large ensembles can generate model outliers comparable to observed extremes (Barriopedro et al., 2011; Gessner et al., 2021). The sensitivity of estimates for rare events to sample size has been documented in both statistics and modeling (Ailliot et al., 2011; Sippel et al., 2015; Paciorek et al., 2018; Annan & Hargreaves, 2011; Coats & Mankin, 2016; Mankin et al., 2020). Synoptic drivers of extreme heat over the PNW include blocking highs and coastal troughs, with soil moisture playing a modulating role (Abatzoglou & Barbero, 2014; Emerton et al., 2022; Neal et al., 2022; Oertel et al., 2023).

Methodology
  • Event definition and observations: Define record‑shattering extremes as events that set a record by a wide margin and act as demonstrable outliers in the population. Use SeaTac Airport (Seattle‑Tacoma International) station daily maximum temperature (GHCN‑D) from 1948–present; the all‑time daily record is 42.2°C on 28 June 2021. Use TXx (annual maximum of daily maximum temperature) to characterize yearly hot extremes.
  • Extreme value analysis: Fit Generalized Extreme Value (GEV) distributions to observed TXx. Assess sensitivity to inclusion/exclusion of specific years (leave‑one‑out, Cook’s distance–like approach). Show that excluding the previous record (2009) truncates the warm tail by ~2°C, and including 2021 extends the warm tail by ~3°C, indicating 2021 is a demonstrable outlier. Report observed GEV parameters: k = −0.22, σ = 2.51, μ = 33.1.
  • Large ensemble hindcasts: Use ACCESS‑D decadal forecast system based on GFDL CM2.1 (ocean MOM5.1). Hindcasts are 10‑year leads initialized on May 1 and Nov 1 each year 1995–2020, with 96 members per start. For TXx, complete calendar years yield total 26 years × 9 lead years × 2 starts × 96 members = 44,928 sample years (~16+ million days). Independence among members verified via near‑zero TXx correlations across members at given leads.
  • Bias correction and model‑obs mismatch: Use the model gridbox containing SeaTac; adjust warm bias by removing daily mean model‑obs difference by day‑of‑year and lead time: f′(d,τ)=f(d,τ)−(⟨f⟩(d,τ)−⟨o⟩(d)). The correction aligns overall means (KS test indicates broad statistical consistency) but not necessarily the hottest‑day mean. Note gridbox vs point mismatch and model process biases; also the model sample period (1995–2030) covers the warmer part of the observational record.
  • Model GEV characterization: Provide best‑fit GEV for bias‑corrected model TXx histogram (k = −0.20, σ = 2.33, μ = 34.8). Compare model and observed TXx histograms and fits.
  • Return period estimation: Estimate return periods for SeaTac‑sized record threshold TXx ≥ 42.2°C using two approaches across varying sample sizes: (1) direct sampling from the full 44,928‑year model population (random subsets, repeated 1000 times each sample size), and (2) GEV fits to the same subsamples to infer return periods. Evaluate uncertainty (interquartile ranges, whiskers, outliers) and minimum sample sizes needed for stable estimates.
  • Synoptic pattern analysis: For all JJA days (~4.59 million), compute pattern matching between 500 hPa geopotential height (Z500) on the ensemble’s hottest SeaTac grid day and every other day, using pattern correlation and RMSE over a ±30° lat/long domain centered on SeaTac. Examine relationship between match quality and daily maximum temperature. Repeat for different domains to assess sensitivity.
  • Soil moisture modulation: For days when TXx occurs, relate match metrics to summer‑mean soil moisture at the SeaTac grid to assess modulation of synoptic control by surface dryness.
  • Sample size sensitivity of maxima: For both the observed GEV (sampling synthetic TXx) and the model ensemble (direct sampling years), draw samples from 10 to ~45,000 years, repeat 1000 times per size, and record the sample maximum TXx to quantify how maxima increase and uncertainty declines with sample size.
  • Nonstationarity within ensemble: For calendar years 2004–2021 (1728 samples per year), compute TXx distributions and their medians; examine the annual maximum TXx per calendar year versus the number of samples per year (hindcast ramp‑up/down) to disentangle effects of warming from sampling. Additionally, illustrate selection‑bias effects via a schematic extrapolation to 2050 using randomized GEVs with fixed scale/shape and a small imposed linear trend in the location parameter (0.06°C per year) to show when baseline warming might surpass a selected extreme outlier.
Key Findings
  • The SeaTac 2021 record (42.2°C) is a demonstrable outlier, materially altering the fitted GEV warm tail when included; excluding the prior 2009 record truncates the warm tail by ~2°C; including 2021 extends it by ~3°C.
  • Large ensemble extremes: The hottest simulated SeaTac grid day in the 44,928‑year ensemble reaches 46.5°C. Synoptic patterns on the hottest few model days closely resemble the observed event: a wave train with an embedded blocking high over SW Canada and a coastal trough/cutoff low directing hot downslope continental flow.
  • Weather pattern rarity: Among ~4.59 million summer days, only a handful closely match the ensemble’s hottest‑day Z500 pattern; closer matches tend to yield higher temperatures, but soil moisture conditions can moderate the outcome.
  • Soil moisture modulation: The very hottest TXx occurrences are more likely during drier summers; similar synoptic setups in wetter summers produce less extreme TXx.
  • Return periods and sample size: Direct sampling yields a return period for TXx ≥ 42.2°C of roughly ~270 years in the model, but with wide uncertainty at small samples. Samples ≥1000 years still give ~100–500‑year estimates; >5000 years are needed for tighter bounds. GEV‑based estimates on small samples (50–100 years) are highly unstable, spanning tens to thousands of years; fitting does not remove the need for large samples.
  • Maxima vs sample size: Both observed‑GEV sampling and direct model sampling show the sample maximum TXx increases with sample size and the uncertainty narrows. In model sampling of ~100‑year segments (typical observational length), maxima generally range ~39–45°C; capturing the absolute hottest outlier (46.5°C) is very unlikely in a single 100‑year sample.
  • Nonstationarity: TXx distributions in the ensemble shift warmer over 2004–2021 (median increase up to ~1°C), but the annual maximum TXx (from 1728 samples per year) shows no clear trend during the period with constant sample size, indicating that extreme record‑level events are dominated by sampling of rare weather configurations rather than the modest underlying warming over this interval.
  • Selection bias and endurance of records: Selecting an extreme after it occurs at a fixed location yields certainty in observations but not in models queried only at that location. Subsequent observed years add very few samples, making another freak alignment exceedingly unlikely; thus, record‑shattering heat records can endure for decades unless baseline warming sufficiently elevates non‑freak extremes to surpass them.
Discussion

The findings show that at fixed locations, record‑shattering heat events are primarily governed by rare, precise synoptic configurations often preconditioned by dryness. Because such weather alignments are sparsely sampled, very large datasets are required to quantify their likelihood and to reproduce them in models. This resolves the apparent paradox of enduring old records despite warming: the small number of additional observed years after a shattering event offers negligible chance of drawing another extreme alignment. Warming shifts the entire TXx distribution (raising common extremes), but over limited periods it may not visibly affect record‑level maxima given sampling limitations. Consequently, the absence of comparable events in climate projections at the same location and epoch can reflect sampling constraints rather than model deficiency. The study clarifies that interpreting extreme outliers requires careful consideration of sampling, selection bias, and the interplay of weather variability with a warming baseline.

Conclusion

The study demonstrates, using a very large hindcast ensemble and observations from SeaTac, that record‑shattering heat records at fixed locations can persist for decades due to the rarity of the necessary synoptic alignments and antecedent dryness. Key contributions include: quantifying the large sample sizes needed to estimate return periods of such events; showing that model‑simulated hottest days intensify with increasing sample size; evidencing strong synoptic control moderated by soil moisture; and disentangling the effects of modest nonstationarity from sampling on extreme maxima. The results caution against judging model fidelity by the absence of selected local outliers in undersized samples and underscore the need to account for sampling and selection bias in extreme event analyses. Future research should extend these large‑ensemble methods to other types of extremes and regions, evaluate sensitivity to model physics and resolution, and investigate regime changes that could alter extreme generation mechanisms.

Limitations
  • Model–observation mismatch: Station point measurements vs model gridbox averages; coastal proximity and topography differences; residual distributional biases despite simple mean‑bias correction.
  • Model dependence: Results rely on one hindcast system (ACCESS‑D/GFDL CM2.1); representation of synoptic variability and blocking may vary across models.
  • Sampling constraints: Even with ~44,928 years total and 1,728 years per calendar year, record‑level events remain undersampled; conclusions about the hottest attainable day could change with even larger samples.
  • GEV sensitivity: Fitted extremes are sensitive to influential points and choices about including/excluding record years; small samples yield unstable parameter estimates and return periods.
  • Assumed trend illustration: The extrapolation schematic imposes a simple linear trend on the GEV location parameter; real‑world nonstationarity may be nonlinear or involve regime shifts.
  • Generalizability: Findings pertain to local, weather‑sensitive heat extremes; other extremes or regions with different drivers may not exhibit the same sampling requirements.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny