
Medicine and Health
Neighborhood-level disparities and subway utilization during the COVID-19 pandemic in New York City
D. Carrión, E. Colicino, et al.
This groundbreaking study by Daniel Carrión and colleagues explores how neighborhood social disadvantage impacts COVID-19 infection rates, mortality, and social distancing in NYC during the spring of 2020. By analyzing ZIP code-level data, they reveal a troubling correlation between disadvantage and health outcomes, shedding light on the inequalities exacerbated by the pandemic.
~3 min • Beginner • English
Introduction
The study addresses whether neighborhood-level social disadvantage in New York City is associated with higher COVID-19 infections, reduced capacity to socially distance, and increased mortality during Spring 2020. In the early pandemic, with limited treatments and testing capacity, non-pharmaceutical interventions (e.g., social distancing) were central. Disproportionate burdens on communities of color were evident, with higher infection and mortality rates among Black and Hispanic/Latinx populations. The authors hypothesize that structural factors—such as employment in essential jobs, commuting patterns, population and residential density, multigenerational households, food access, socioeconomic status, and healthcare access—reduce the ability to socially distance and elevate infection risk. They propose constructing a composite, ZIP Code Tabulation Area (ZCTA)-level inequity index tuned to infection data to quantify these relationships and examine links to mortality and subway ridership as a proxy for social distancing capacity.
Literature Review
Prior work and emerging evidence indicate COVID-19’s disproportionate toll on communities of color, with higher early mortality and incidence among Black and Hispanic/Latinx populations. Structural racism manifests through housing, employment, earnings, healthcare access, and criminal justice, shaping health disparities and exposure risk. Residential segregation and structural disadvantages have been tied to infectious disease disparities, and social distancing has been shown to be more difficult in communities of color due to higher representation in low-wage and essential occupations, greater household crowding and multigenerational living, and poorer food access necessitating travel. Studies at county and ZIP code scales show associations between socioeconomic measures (crowding, percent people of color, racialized economic segregation) and higher COVID-19 infections and mortality, and mobility disparities linked to subway and cellphone data. Concerns about testing access as a confounder have mixed evidence; some analyses found no income-related testing inequalities. The present work extends this literature by creating a supervised, composite index tailored to infection risk and by linking it to mobility and mortality outcomes.
Methodology
Design: Ecological, population-level study in NYC using publicly available datasets. Spatial unit primarily: modified ZIP Code Tabulation Areas (ZCTAs; n=177). Time windows: cumulative infections as of May 7, 2020 (approximately 4 weeks after peak); cumulative mortality as of May 23, 2020 (based on an estimated 16-day lag from symptom onset to death). Data sources: NYC DOHMH SARS-CoV-2 testing (positive and total tests) and mortality by ZCTA; U.S. Census Bureau 2018 American Community Survey (ACS) for socioeconomic and demographic variables; NYC building footprints and PLUTO data to compute residential volume and residential population density; New York State retail food store data to estimate grocers per 1,000 population; MTA subway turnstile counts (2015–2019 baseline for normalization and 2020 daily ridership). Exposure/index construction: A supervised composite COVID-19 inequity index was built via Bayesian Weighted Quantile Sums (BWQS) regression. Candidate variables included measures of household size, income (transformed reciprocally when negatively related), rent, rent burden, SNAP usage, poverty, health insurance status, unemployment, industry of employment to proxy essential workers, commuting modes, population density (persons per square foot) and residential density (persons per cubic foot), and food access (grocers per 1,000). Variables were transformed into deciles (quantiles) and scaled to allow an index range [0,10). Weights (Dirichlet(1) prior) were learned to reflect contributions to the outcome. Outcome for training: cumulative positive tests per 100,000. Model: Negative binomial BWQS regression with log link; covariate adjustment for testing intensity via a natural spline (3 df) of the testing ratio (total tests/population). Priors: Normal(0,100) for regression coefficients; inverse-gamma(0.01,0.01) for overdispersion; sensitivity analyses used half-Cauchy(0,3) for overdispersion. Estimation via Hamiltonian Monte Carlo; model selection/diagnostics included WAIC, Bayesian R-squared, RMSE, and residual diagnostics. Sensitivity/robustness: Compared BWQS to (a) a simpler model with percent uninsured and median income, and (b) a PCA-based index (PC1) in negative binomial regression; assessed ZCTA-level summaries versus tract-derived median and 75th percentile measures aggregated to ZCTAs using HUD crosswalks; evaluated alternative priors; and considered multicollinearity thresholds (removed variables with very high pairwise correlation). Mobility analysis (capacity to social distance): Aggregated subway ridership to United Hospital Fund (UHF) neighborhoods (42 areas), retaining 36 UHFs with subways and consistent data. Relative daily ridership was normalized to median day-of-week, month-specific 2015–2019 baselines. Outlier low values from planned service changes were excluded. UHFs were split by population-weighted inequity index (above vs below median). Nonlinear decay in ridership was modeled using a generalized Weibull function (parameters included lower asymptote c, upper asymptote d, slope b, and inflection e) via maximum likelihood (drc R package). Model fit compared between single and split (high/low index) curves (partial F-test), with comparisons of slopes and lower asymptotes; analyses repeated with three risk groups and at ZCTA level. Mortality analysis: Negative binomial regression of cumulative COVID-19 deaths per ZCTA on the inequity index, incorporating spatial filtering (including eigenvector associated with spatial autocorrelation measured by Moran’s I) to adjust for spatial dependence. Software: R 4.0.2 with packages including tidycensus, sf, MASS, spdep, spatialreg, drc; mapping and spatial joins conducted in sf; code and data links provided by authors.
Key Findings
- BWQS infection model: Each 1-unit increase in the COVID-19 inequity index was associated with an 8% increase in infections per capita (risk ratio 1.08; 95% credible interval 1.06, 1.09), adjusting for testing intensity; Bayesian R-squared 0.93 (95% CrI 0.92, 0.95). Residuals showed no significant deviation from expected distribution. - Variable contributions: All ten variables contributed; the proportion uninsured was the largest contributor, followed by average household size, and the proportion of essential workers commuting by personal vehicle. In model iterations, uninsured and household size had the highest weights in 39% and 25% of iterations, respectively. Population density and median income were also informative. - Method comparison: BWQS outperformed simpler two-variable and PCA-based models (smaller RMSE and higher Kendall’s tau rank correlation for expected vs observed infections per capita). - Robustness to spatial summary choice: Using tract-level medians/75th percentiles aggregated to ZCTAs yielded consistent effect estimates and modest improvements in WAIC, Bayesian R-squared, and RMSE. - Spatial and demographic patterns: The inequity index spatial distribution mirrored infections. Black and Hispanic/Latinx residents had higher population-weighted mean index values; white residents were overrepresented in low-index ZCTAs and underrepresented in high-index ZCTAs. In high-index (>75th percentile) ZCTAs, white populations comprised ~10% versus ~32% citywide. - Subway ridership (capacity to socially distance): Models split by high vs low index fit significantly better than a combined model (partial F-test p<0.0001). Slopes were similar between groups (high: −5.7% per day, 95% CI −6.0, −5.3; low: −6.2% per day, 95% CI −6.5, −5.8; p≈0.45), but the lower asymptote (minimum sustained ridership under social distancing) was higher in high-index areas (16%, 95% CI 15.3, 16.7) versus low-index areas (9.5%, 95% CI 8.9, 10.1; p≈2×10⁻¹⁶), indicating reduced capacity to fully reduce transit use in disadvantaged areas. Results were consistent with three-group splits and at the ZCTA level; trends matched other mobility datasets (e.g., Google transit). - Mortality: Each 1-unit increase in the inequity index was associated with a 20% increase in COVID-19 mortality risk (relative risk 1.20; 95% CI 1.16, 1.23) after spatial filtering. Residual spatial autocorrelation was small (Moran’s I 0.05) and non-significant (p=0.08). - Descriptive counts: 174,614 positive tests across 177 ZCTAs (as of May 7, 2020); 16,289 COVID-related deaths across 177 ZCTAs (as of May 23, 2020).
Discussion
Findings support the hypothesis that structural neighborhood disadvantages are associated with higher COVID-19 infection incidence, reduced capacity to socially distance (as evidenced by persistently higher subway use), and higher mortality in NYC during Spring 2020. The composite, supervised inequity index captures a correlated mixture of socioeconomic and infrastructural factors, with uninsured rates, household crowding, and essential worker commuting patterns playing prominent roles. Disparities by race/ethnicity in index distributions align with structural racism concentrating people of color in higher-disadvantage neighborhoods, which likely facilitated viral spread and contributed to worse outcomes. The index’s strong performance versus simpler or unsupervised alternatives and its association with mortality suggest utility for identifying high-risk neighborhoods and tailoring interventions (e.g., targeted testing, isolation support like hotel programs, outreach in areas with larger households). The study complements and extends literature linking socioeconomic disadvantage to COVID-19 burden and mobility constraints, emphasizing that uniform social distancing policies may have unequal feasibility and impacts across neighborhoods. Public health planning should incorporate structural factors to achieve equitable outcomes.
Conclusion
The study introduces a supervised, ZCTA-level COVID-19 inequity index that quantifies neighborhood social disadvantage associated with infection and mortality risk in NYC and demonstrates reduced capacity to socially distance in high-index areas. The approach can inform targeted public health interventions and resource allocation by highlighting the mixture of social factors underlying elevated risk. Future work should assess generalizability to other regions and time periods, adapt the approach to local contexts (yielding region-specific variable sets), and evaluate its relevance for future waves or other respiratory pathogens. Care should be taken to avoid stigmatization and to use the index to support equitable, community-informed interventions.
Limitations
Key limitations include: (1) lack of a ZCTA-level measure of multigenerational housing; (2) exclusion of race/ethnicity from the index may miss modeling interpersonal and structural racism directly; (3) infection data based on viral swab-confirmed cases during periods of limited testing (initially hospitalized/severe cases), potentially confounding incidence with severity; adjusted for testing intensity but residual bias may remain; (4) absence of ZCTA-level chronic disease measures in mortality models; adjusting for such may over-control mediators but their omission could confound associations; spatial filtering was used to mitigate unmeasured spatial confounding; (5) use of pre-pandemic ACS data that do not capture pandemic-related residential mobility (e.g., temporary moves from affluent areas), partially proxied by income; (6) transit analysis relied on subway turnstiles, lacking time-varying bus ridership (buses were free, limiting data availability); (7) BWQS, while suitable for correlated exposures, had not previously been applied to social/infectious disease epidemiology; (8) reliance on ZCTAs as analysis units, which can be heterogeneous and are not decision-making units; tract-level sensitivity analyses showed consistency; (9) potential for neighborhood stigmatization with an inequity index—users should focus on structural drivers and equitable intervention planning.
Related Publications
Explore these studies to deepen your understanding of the subject.