logo
Loading...
Disparate impacts on online information access during the Covid-19 pandemic

Social Work

Disparate impacts on online information access during the Covid-19 pandemic

J. Suh, E. Horvitz, et al.

Discover how the COVID-19 pandemic affected digital access to health and economic resources across US communities. This intriguing study by Jina Suh, Eric Horvitz, Ryen W. White, and Tim Althoff uncovers varying patterns of resource utilization that highlight socioeconomic and environmental disparities.... show more
Introduction

The study investigates how the COVID-19 pandemic altered online information access and whether changes differed across communities along socioeconomic and environmental dimensions. Motivated by documented disparities in health and socioeconomic burdens during COVID-19—especially among socioeconomically disadvantaged and minority communities—the authors examine the second-level digital divide (differences in usage and skills) via population-scale web search logs. The purpose is to quantify how offline exclusion (e.g., low income, unemployment, limited health insurance) relates to intensification or attenuation of digital engagement (e.g., seeking health information, accessing unemployment benefits, remote learning, food delivery). The study frames these questions within the Social Determinants of Health (SDOH) to identify which community characteristics are associated with different digital engagement responses during the pandemic and to highlight potential downstream implications for health, education, and economic outcomes.

Literature Review

Prior work on disparities during COVID-19 has emphasized epidemiological outcomes, showing disproportionate infection and mortality among disadvantaged and predominantly Black counties. Digital access has been increasingly recognized as influencing health outcomes via online health information and telehealth use. The digital divide spans: first-level (infrastructure access/quality), second-level (usage and skills), and third-level (differential offline benefits derived from digital use). Traditional approaches to studying digital disparities often rely on surveys or datasets limited to specific services, domains, or geographies, missing broad, real-time behavioral coverage. Web search logs, routinely collected at scale, have enabled research across domains and can reveal needs and barriers not captured by official statistics (e.g., unemployment claims). Literature shows SES, race/ethnicity, and digital literacy influence engagement with capital-enhancing activities online, with implications for health, education, and employment. This study extends digital disparities research by using near real-time, population-scale search data across multiple domains mapped to SDOH categories.

Methodology

Design: Retrospective longitudinal observational study using Bing search logs to quantify changes in digital engagement before and during the COVID-19 pandemic, analyzed at the ZIP code level and structured by SDOH factors. Data: A random sample of approximately 57 billion de-identified Bing search interactions (queries and subsequent clicked URLs, timestamps, inferred ZIP code) from the United States for 2019–2020; after exclusions and joins with census data, 55 billion interactions from 25,150 ZIP codes covering ~97.2% of the US population. Both desktop and mobile interactions were included; device-type analyses were out of scope. Privacy: Data were de-identified, aggregated to ZIP code or higher, and approved by Microsoft Research IRB as Not Human Subjects Research, with additional privacy, security, and legal review. ZIP-level covariates (SDOH): Eight census variables representing five SDOH categories: Healthcare Access and Quality (% with health insurance); Education Access and Quality (% with BA or higher); Social and Community Context (% Black; % Hispanic); Economic Stability (median household income; % unemployed); Neighborhood and Built Environment (% with internet subscription; population density via ZCTA Gazetteer area and population). Grouping and thresholds: For each SDOH factor, ZIP codes were split into high-risk (treatment) and low-risk (control) groups using thresholds near medians where applicable (e.g., income $55,224; unemployment 3.0%; insurance 92.7%; internet access 81.8%; BA+ 21.1%) or national population averages for race/ethnicity (12% Black; 18% Hispanic) and 500 people/sq mi for density. Treatment groups were consistently defined as higher-risk: low income, high % minority, low education, high unemployment, low insurance, low internet access, high density. Matching: One-to-one nearest-neighbor matching with replacement using Mahalanobis distance (MatchIt), balancing all other SDOH covariates. Balance assessed via Standardized Mean Difference (|SMD|<0.25) across all covariates; calipers tuned to achieve balance. Matching retained at least 99.8% of treatment ZIPs. Search categories: Categories reflecting key needs across SDOH: health information (e.g., health conditions incl. coronavirus), economic assistance (unemployment queries; clicks to state unemployment sites; financial assistance/stimulus), education (clicks to free online learning sites), food access (online food delivery; food assistance). Detection used English-language regular expressions on query strings and/or click URLs; some click-based detectors were language independent. Measures: Digital engagement expressed as the proportion of total queries in a time window belonging to a category E(t,c)=N(t,c)/N(t). Aggregation at 2- or 4-week windows per ZIP. Difference-in-differences (DiD): To isolate pandemic-related changes, the study (1) aligned weekdays across years to control for weekly periodicity; (2) subtracted 2019 seasonal baselines from 2020; (3) subtracted a pre-pandemic baseline (Jan 6–Feb 23, 2020) from during-pandemic periods (post Mar 16, 2020) to compute relative percentage changes Cperc. Analyses were performed at the matched-group level (sum within groups before differencing) to mitigate sparse-baseline issues. Disparities were quantified as percentage point differences in Cperc between high-risk and low-risk matched groups for a given SDOH factor. Uncertainty: 95% non-parametric confidence intervals via bootstrapping with replacement (500 iterations) at the aggregation step. Validation/coverage: Bing query-based market share ~26.7% (Comscore). Trends compared with Google for matched categories showed high correlations (Pearson r=0.86–0.98). Location inference uses proprietary engine with enhancements beyond reverse IP; demographics of search users assumed to approximate ZIP populations with known biases discussed. Scope: Analyses at ZIP code level; individual-level inferences avoided; causal claims not made given universal exposure to the pandemic; English-only regex for queries; broad categories examined rather than subcomponents.

Key Findings
  • Health information access: Overall health-condition query proportions increased nearly 1000% vs pre-pandemic baseline. Contrary to expectations, low-income ZIP codes showed over a 200 percentage point smaller increase in health-condition queries than high-income ZIP codes (95% CI [-287, -152]). ZIP codes with higher proportions of Hispanic residents, higher population densities, and higher unemployment rates also showed lower relative changes during the first four weeks. ZIP codes with lower educational attainment (≤21.1% BA+) made over 70 percentage points more health-condition queries than higher-education ZIP codes (95% CI [31, 117]).
  • Economic assistance—unemployment: Unemployment-related queries mirrored BLS claims timing. ZIP codes with higher proportions of Black residents (≥12%) had a 3026% increase vs 1365% in lower-Black ZIPs; disparity 1661 percentage points (95% CI [260, 2374]). A second surge in clicks to state unemployment sites occurred after July 2020 (post expiration of federal supplement). In August, higher-Black and higher-Hispanic ZIPs showed larger increases in unemployment-site clicks: +789 pp (95% CI [595, 957]) and +716 pp (95% CI [351, 1043]), respectively; low-education ZIPs showed -517 pp (95% CI [-1009, -81]) relative change in such clicks.
  • Economic assistance—financial stimulus: Financial assistance-related queries peaked mid-April 2020, increasing by over 15,000% on average. ZIP codes with higher proportions of Black residents had 5,119 percentage points less change in such queries between Apr 13–May 10 (95% CI [-8809, -1407]), despite matching on income and education.
  • Education—online learning: Clicks to free online learning sites increased by over 200% early in the pandemic. Low-income ZIPs and higher-Hispanic ZIPs exhibited only half to two-thirds of the increase compared to counterparts (pp differences 95% CI [-227, -109] and [-202, -46], respectively). Similar attenuations observed for higher-Black and higher-density ZIPs. In fall 2020, overall engagement attenuated; low-income and higher-unemployment ZIPs showed smaller attenuation, whereas higher-Black ZIPs showed larger attenuation.
  • Food access—online food delivery: Online food delivery queries rose >500% in ZIPs with lower proportions of Black residents, but only >170% in higher-Black ZIPs; disparity 95% CI [-382, -188]. Similar reduced engagement in lower-income and higher-Hispanic ZIPs (95% CI [-200, -29] and [-140, -24], respectively).
  • Food assistance: ZIPs with lower educational attainment showed a 301 percentage point higher increase in food assistance-related queries (95% CI [167, 419]) relative to higher-education ZIPs, highlighting increased need where digital food purchase/delivery options were often not supported by assistance programs.
Discussion

The analysis demonstrates substantial, uneven shifts in digital engagement across communities during COVID-19, with disparities aligned to SDOH factors. Despite overall surges in health information seeking, low-income and minority-heavy ZIP codes increased engagement less in several domains, suggesting potential barriers to accessing and leveraging digital resources. Conversely, some categories (e.g., unemployment queries in higher-Black ZIPs; food assistance in low-education ZIPs) showed heightened demand not fully reflected in official statistics, signaling unmet needs or barriers to successful benefit acquisition. Framed within SDOH, results suggest determinant-specific vulnerabilities: Economic instability relates to smaller increases in health information seeking and online learning; social/community context (higher % Black/Hispanic) associates with smaller increases in health seeking, online learning, and food delivery, but larger unemployment-related engagement; lower education associates with larger increases in health information seeking and food assistance. Population density associates with smaller increases in health and online learning. These patterns underscore the bidirectional relationship between digital and offline exclusion, where deficits in digital engagement can exacerbate inequalities in health, education, and economic outcomes.

Conclusion

This study introduces a web-based, population-scale approach to quantify disparities in digital engagement during a global crisis, leveraging search logs mapped to SDOH. It identifies at-risk communities that either disproportionately intensified or reduced access to critical online resources across health, education, economic assistance, and food. Findings motivate determinant-specific interventions to reduce barriers (e.g., enhancing digital literacy, improving quality of access, simplifying benefit processes) and to ensure surges in need are met. Future research directions include: measuring digital literacy via search interaction patterns; examining device-specific access and quality; conducting focused, small-scale studies to contextualize experiences and assess community-specific interventions; and modeling the long-term offline impacts of observed digital engagement gaps, particularly for low-income communities’ use of online learning and sustained unemployment assistance needs in higher-Hispanic communities.

Limitations
  • Online data may exclude individuals with minimal or no digital footprint; results reflect activity among users who engage online.
  • Limited to Bing users; while trends correlate highly with Google, Bing’s user base may not fully represent the US population.
  • Query detectors primarily use English-language regular expressions; cross-language analyses are out of scope.
  • ZIP-level analysis cannot attribute behaviors to individuals; SDOH factors at individual level are unavailable.
  • Broad category analysis precludes claims about specific subcomponents or keywords.
  • Causal inference is limited: universal exposure to the pandemic prevents counterfactual comparison; DiD adjustments remove seasonality and baselines but cannot establish causality.
  • Location inference to ZIP codes is an approximation; demographic assumptions about search users and ZIP populations may introduce bias.
  • Device-type differences were not analyzed despite inclusion of desktop and mobile interactions.
  • Although internet access was controlled in matching analyses, not all unobserved confounders can be ruled out.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny