logo
ResearchBunny Logo
The presence of Superfund sites as a determinant of life expectancy in the United States

Environmental Studies and Forestry

The presence of Superfund sites as a determinant of life expectancy in the United States

A. Kiaghadi, H. S. Rifai, et al.

This study conducted by Amin Kiaghadi, Hanadi S. Rifai, and Clint N. Dawson explores the alarming impact of Superfund sites on life expectancy in the U.S., revealing significant health risks for disadvantaged communities. Discover how site characteristics and sociodemographic factors amplify this issue.

00:00
00:00
~3 min • Beginner • English
Introduction
Life expectancy (LE) is a fundamental public health indicator with large observed disparities driven by inequalities in mortality risks linked to sociodemographic variables and disease burdens. In developed settings such as the U.S., exposures to chemical and biological hazards can contribute to non-communicable diseases and impact LE. Despite evidence of health risks from hazardous waste, national-scale analyses of Superfund sites’ impacts on LE at fine spatial resolution have been limited. This study investigates whether and to what extent living near a Superfund site is associated with changes in LE in the United States, considering both independent effects and interactions with sociodemographic determinants. The work also evaluates the role of site characteristics, notably flood susceptibility and cleanup status, given concerns that many Superfund sites are vulnerable to natural hazards that may mobilize contaminants and expand exposure footprints. The research addresses four questions: (1) Does the presence of a Superfund site within a census tract independently cause a significant change in LE compared to neighboring tracts without sites? (2) What is the magnitude of LE change attributable to Superfund presence after accounting for sociodemographic determinants? (3) How does the Superfund effect vary across sociodemographic strata (most vulnerable populations)? (4) How do site characteristics—flooding potential, National Priorities List (NPL) status, and cleanup status—modify the effect on LE.
Literature Review
Prior studies link sociodemographic factors (race/ethnicity, income, education, age, etc.) and disease burdens to LE disparities. Evidence suggests contaminant releases from industrial and hazardous waste sources can increase mortality risks in fence-line communities, though findings are not uniformly consistent across health outcomes. For proximity to hazardous waste sites and Superfund sites, some research found increased risks for outcomes like congenital anomalies and non-Hodgkin’s lymphoma, particularly near unremediated sites, while other studies found no association with certain outcomes (e.g., maternal-fetal death). In Europe, exposure to benzene emissions near sources correlated positively with mortality. U.S. studies reported associations between residential proximity to Superfund sites and non-Hodgkin’s lymphoma, especially among males. Economic evaluations question cleanup impacts on housing markets but estimate substantial medical costs and productivity losses from exposures at sites with completed exposure pathways. Literature on disasters and climate change emphasizes that flooding and extreme weather can mobilize contaminants from hazardous sites, potentially extending exposures beyond immediate neighbors. A Government Accountability Office report indicated about 60% of EPA-managed Superfund sites could be affected by natural hazards such as flooding and wildfire, underscoring the need to assess how flood vulnerability may influence health impacts. Few national studies at sub-county scales have comprehensively quantified Superfund sites’ impacts on LE; most prior research focuses on NPL sites (~1,300) and does not include the broader universe of non-NPL hazardous sites (~11,700).
Methodology
Design and scope: A nationwide, census-tract–level geocoded observational study was conducted across the contiguous U.S. (72,268 tracts; LE data available for 65,226). The study integrates life expectancy estimates, sociodemographic data, Superfund site locations and attributes, and flood hazard information. Analyses include matching comparisons, multivariable ordinary least squares (OLS) regression, Random Forests (RF) modeling, effect modification, quantile regression, and evaluations of site characteristics (NPL status, cleanup status, flood susceptibility). Data sources: (1) Sociodemographic data (2018) from NHGIS/IPUMS at the census tract level, including age structure, income, poverty, race, disability, marital status, health insurance coverage, education, and citizenship. (2) Life expectancy (2010–2015) from the National Center for Health Statistics USALEEP; for Maine and Wisconsin, estimates were used due to missing geocoded death records. (3) Superfund site data (active and archived as of 2019) from EPA’s Superfund Enterprise Management System; 13,093 sites available, with 1,864 having coordinates; batch geocoding (Geocodio) was performed for the remainder, resulting in 11,989 unique sites in the contiguous U.S. (points). (4) Flood hazard data (2000–2020) from FEMA’s National Flood Hazard Layer (NFHL), merged from county-level layers to a national dataset. Exposure and site classifications: A 322 m (0.2 mile) buffer around each site was used to assess floodplain overlap. Sites were classified as prone to flood if ≥25% of their area intersected FEMA-defined floodways or 100/500-year floodplains; otherwise as minimal flood risk; areas with no FEMA coverage were “unknown” (excluded from inferential analyses). Sensitivity analyses varying the buffer radius from 100 m to 5,000 m showed ±20% change in flooding percentage; future work could incorporate true site footprints. NPL classification grouped sites currently on the Final NPL or part of an NPL site as “NPL” versus all other statuses as “Not NPL.” Cleanup status was categorized as “active cleanup” (NPL and any non-NPL with cleanup activity), “no cleanup” (non-NPL under assessment/review), and “unknown” (excluded in modeling using this factor). Superfund presence within a census tract was assigned via spatial join; if multiple sites were within a tract, priority for classification was given to prone to flood, NPL, and no cleanup. Variables and preprocessing: LE distribution approximated normal; Kolmogorov-Smirnov tests indicated sociodemographic variables were non-normal. Spearman’s rank correlations guided variable selection and identified collinearity; among highly correlated variables, representative measures were selected. Final modeling variables included: percent population ≥60 years (Above60), median income (per $10,000), percent white (White), percent with disability (Disability), percent married (Married), percent with health insurance (Insurance), percent with education beyond high school (Education), percent U.S. citizens (Citizenship), and Superfund presence (binary 0/1). Matching analyses: Treated tracts (with ≥1 Superfund site) were compared to the median LE and sociodemographic values of immediately adjacent neighbor tracts without sites (unexposed). Independent-samples Mann-Whitney U tests (and two-tailed t-tests for LE) assessed differences. As a control, the matching procedure was repeated across all tracts irrespective of Superfund status to check for innate neighbor differences. Quantification via regression and machine learning: OLS regression modeled LE as a function of Superfund presence and sociodemographic covariates. An initial unadjusted OLS with Superfund-only was followed by stepwise inclusion of covariates (nine models total), culminating in a full model to account for confounding. Model performance was compared to a non-linear Random Forests (MATLAB treebagger; 100 trees; minimum leaf size 20) trained on 50,000 tracts with validation on 15,226 tracts, to assess potential gains from non-linear modeling while noting interpretability trade-offs. Effect modification and quantile regression: To evaluate heterogeneity of the Superfund effect across sociodemographic strata, separate OLS models incorporated a dummy interaction term defined as the product of Superfund presence (0/1) and an indicator of being above/below the national median for each sociodemographic variable; β1 (Superfund) and β1+β2 (Superfund within subgroup) were compared. Quantile regression among tracts with at least one site (N=12,717) examined how coefficients vary across LE quantiles, capturing non-linearities and heteroskedastic effects. Site characteristic analyses: Among tracts with ≥1 site, Kruskal-Wallis and Mann-Whitney U tests compared LE by site flooding classification, NPL status, and cleanup status. Separate OLS models included dummy interactions of Superfund presence with each site characteristic.
Key Findings
- Descriptive differences: Median LE across all tracts was 78.50 years; tracts with at least one site had 77.50 years; tracts with no sites had 78.70 years (difference ~1–1.2 years before adjustment). - Matching analyses: Significant LE differences between tracts with sites and their neighbors without sites (t-test P=8.96E-15; Mann-Whitney U P<0.05). No significant general neighbor effect when considering all tracts regardless of site status (t-test P=0.06). Sociodemographic variables also differed significantly between site and neighbor tracts (except citizenship), justifying multivariable adjustment. - OLS regression: Unadjusted Superfund-only coefficient was −1.146 years; after adjusting for eight sociodemographic covariates, the Superfund coefficient was −0.186 ± 0.027 years (P<0.0001), indicating an average LE reduction of 0.186 years associated with site presence. Final model: R=0.739, R^2=0.546 (adjusted), RMSE=2.694 years, all covariates significant (P<0.0001). Selected coefficients (unstandardized): Above60=+0.053 (±0.002), White=+0.026 (±0.001), Income=+0.236 (±0.007 per $10,000), Insurance=+0.031 (±0.002), Married=+0.063 (±0.001), Education=+0.068 (±0.001), Citizenship=−0.110 (±0.001), Disability=−0.013 (±0.003), Superfund=−0.186 (±0.027). - Random Forests performance: Slightly better predictive accuracy than OLS. Validation RMSE: RF 2.578 vs OLS 2.908 years; RF explained modestly more variability. However, performance similarity supports OLS use for interpretability. - Heterogeneity by sociodemographics (effect modification): In tracts with at least one site (N=12,717), the adverse Superfund effect was stronger in more disadvantaged strata. Example: income below national median ($52,580) showed an LE reduction up to −0.58 years on average, reaching −1.223 years in the lowest 10% income tracts. High income could offset the negative Superfund effect; for above-median income tracts, combined effect was positive (+0.32 years). Similar attenuation with higher education, being married, and higher insurance coverage. Disability showed minimal stratified difference (~0.04 years). Higher proportions of older adults (≥60) reduced the magnitude of the adverse effect; higher citizenship (fewer immigrants) corresponded to greater adverse effects. - Quantile regression: Superfund presence had more negative impact at lower LE quantiles, amplifying disadvantages; effect diminished at higher LE quantiles. Education and insurance had stronger positive effects in lower LE quantiles (e.g., 10% increases could raise LE by ~1 year in low-LE tracts) and weaker effects in high-LE tracts. White (%) and income showed larger positive effects at higher LE quantiles. - Site characteristics: • NPL status: Tracts with NPL sites had higher median LE than tracts with non-NPL sites (78.2 vs 77.4 years; Mann-Whitney P=1.03E-12). In OLS with NPL interaction, the Superfund effect was near zero for NPL tracts (−0.001 years) versus −0.217 years for non-NPL tracts, suggesting mitigation/monitoring and redevelopment around NPL sites may offset adverse impacts. • Cleanup status: Significant LE differences between tracts with active cleanup and those with no cleanup; median LE 77.5 vs 77.35 years. Effect modification indicated only a small improvement (~+0.065 years) associated with cleanup status, potentially reflecting long latency and chronic exposure prior to remediation. • Flood susceptibility: Using a ≥25% inundation threshold, tracts with sites prone to flooding had lower median LE (77.20) than minimal-risk sites (77.60). OLS with flooding dummy estimated minimal flood risk associated with −0.034 years, with flooding amplifying the adverse effect by an additional −0.33 years (total ~−0.36). Binary flood definitions obscured differences (P=0.19), highlighting the value of the threshold approach. - Coverage of sites: Approximately 11,989 unique sites identified; using the 25% threshold, ~24% of non-NPL and ~21% of NPL sites were located in flood-prone regions. Sensitivity analysis indicated flood classification varies with buffer radius by ±20%. Overall, the presence of a Superfund site is associated with a statistically significant reduction in LE after adjusting for sociodemographic confounders, with stronger adverse effects in disadvantaged contexts and when sites are non-NPL, lack cleanup, or are flood-prone.
Discussion
The study demonstrates that Superfund site presence within a census tract is associated with lower LE, even after adjusting for key sociodemographic determinants. This addresses the qualification question by showing significant differences between site tracts and their neighbors and addresses the quantification question by estimating an average adjusted effect of −0.186 years. The analyses further reveal that the Superfund impact is not uniform: it is most pronounced in tracts with greater sociodemographic disadvantages (lower income, education, insurance coverage; fewer married individuals), and in lower LE strata where incremental improvements in social determinants yield larger gains. Site-specific characteristics materially influence health impacts. NPL listing appears to attenuate or nearly eliminate the net adverse association, likely reflecting prioritized cleanup, monitoring, and redevelopment efforts that improve local conditions and reduce exposures. Cleanup status is associated with small LE improvements, potentially due to long exposure histories before remediation. Flood susceptibility substantially amplifies adverse impacts, consistent with mechanistic expectations that flooding mobilizes contaminants and introduces new exposure pathways affecting both fence-line and more distant communities. The results support public health and environmental policy implications: continued prioritization of high-risk sites for cleanup and monitoring, attention to flood-vulnerable sites (especially non-NPL), and targeting of interventions toward disadvantaged communities where impacts are greatest. The modest difference in performance between linear and non-linear models reinforces that interpretable regression can capture much of the variance while providing actionable effect estimates. Assumptions, such as residence time and uniform exposure irrespective of contaminants and pathways, as well as residual confounding, suggest caution in causal interpretation; nonetheless, consistent patterns across multiple analyses substantiate the central finding that Superfund presence is a determinant of LE disparities.
Conclusion
This nationwide, tract-level analysis provides evidence that living near a Superfund site is associated with reduced life expectancy, independent of and interacting with sociodemographic factors. The average adjusted LE reduction is approximately 0.186 years, with substantially larger reductions (up to ~1.22 years) among the most socioeconomically disadvantaged tracts. Flood-prone sites, non-NPL status, and lack of cleanup are associated with more adverse outcomes, whereas NPL listing appears to offset much of the negative association, likely through enhanced remediation and monitoring. The study contributes a comprehensive, fine-scale assessment that incorporates both NPL and non-NPL sites and evaluates flood vulnerability. Future research should incorporate temporal analyses of site histories and population mobility, improved exposure characterization by contaminant profiles and pathways, refined spatial delineation of site boundaries, and additional confounding controls (e.g., stress, environmental co-exposures). Enhanced modeling and longitudinal designs would strengthen causal inference and guide targeted interventions to reduce environmental health disparities.
Limitations
- Cross-sectional design without temporal tracking of exposure, mobility, or site history limits causal inference and may introduce exposure misclassification. - Assumption that 2010–2015 LE reflects sufficiently long residence times near sites; residence-time uncertainty could produce differential measurement error. - Potential residual confounding from unmeasured variables (e.g., stress, other environmental exposures, healthcare access nuances) despite adjustment for key sociodemographics; acknowledged by the remaining association after adjustment and relatively high RMSE. - Uniform exposure assumption across heterogeneous sites (varying contaminants, physical states, and pathways) may mask site-specific effects. - Geospatial uncertainties: many sites required batch geocoding; analyses used a point representation with a 322 m buffer rather than precise site footprints; flood susceptibility classification is sensitive to buffer radius and threshold definitions; FEMA coverage gaps led to “unknown” categories (excluded in relevant analyses). - Model performance limitations: Even RF left ~2.6 years RMSE unexplained; OLS performance weaker at extreme high LE quantiles; potential non-linearities and interactions not fully captured by linear models. - Maine and Wisconsin LE data were estimated due to missing geocoded death records, potentially introducing error.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny