logo
Loading...
Large-scale diet tracking data reveal disparate associations between food environment and diet

Health and Fitness

Large-scale diet tracking data reveal disparate associations between food environment and diet

T. Althoff, H. Nilforoshan, et al.

This fascinating study by Tim Althoff, Hamed Nilforoshan, Jenna Hua, and Jure Leskovec explores how access to grocery stores, fast food, and socioeconomic factors influence diet among over a million MyFitnessPal users in the US. The findings reveal significant disparities based on community demographics, underscoring the need for targeted dietary interventions.... show more
Introduction

Dietary factors contribute substantially to global mortality and chronic disease risk, including cardiovascular disease, type 2 diabetes, and cancer. Emerging evidence indicates that built and food environments, behavioral, and socioeconomic factors affect diet, yet prior studies have yielded mixed results, often due to limitations such as small samples, localized contexts, heterogeneous populations, and non-uniform measures of food environment and diet. There is a need for large-scale research using consistent methodologies and measurements. With widespread smartphone ownership and abundant geospatial data, it is now possible to combine individual diet logs with population demographics, socioeconomic status, and food environment measures. This study leverages 1,164,926 MyFitnessPal participants across 9,822 U.S. zip codes and integrates Internet data sources to quantify the independent associations of grocery and fast food access, income, and educational attainment with food consumption and BMI status, constituting the largest nationwide study of the food environment–diet relationship to date.

Literature Review

Prior research on food environments and diet has been extensive but inconclusive, with systematic reviews documenting mixed associations between local food access and dietary behaviors or obesity. Methodological challenges underpin these inconsistencies, including small sample sizes, geographic heterogeneity, varied populations, and differing measures of both diet and the food environment (e.g., screeners, FFQs, 24-h recalls). Reviews have highlighted the need for standardized, scalable measures and consideration of socioeconomic determinants. Studies have examined the roles of retail environments, built environment, and social determinants, with socioeconomic status and education frequently linked to dietary quality and obesity risk. Evidence regarding interventions such as new grocery store openings shows limited direct impact on diet without complementary measures, whereas incentives for fruit and vegetable purchases have shown promise. Overall, the literature suggests complex, context-dependent relationships shaped by SES, education, and neighborhood characteristics, warranting large-scale, consistent assessments and subgroup analyses.

Methodology

Design: United States countrywide cross-sectional observational study analyzing associations between zip code–level food environment and socioeconomic factors with diet and BMI outcomes. Population: 1,164,926 MyFitnessPal (MFP) app users across 9,822 U.S. zip codes, logging 2.3 billion food entries from January 1, 2010 to November 15, 2016. Inclusion required ≥30 participants per zip code. Participants averaged 9.30 entries/day and 197 days of app use; all used the app at least 10 days. Data sources: (1) Diet and BMI from MFP; (2) Demographics and SES (median family income; fraction with bachelor’s degree or higher; race/ethnicity composition) from ACS 2010–2014 via CensusReporter; (3) Grocery access from USDA Food Access Research Atlas; (4) Fast food access from Yelp business listings. Outcome measures: At participant level, average daily entries per day per category, aggregated to zip code means. Categories: fresh fruits & vegetables (F&V) (proprietary classifier consistent with USDA MyPlate, excluding juices), fast food (brand matching to chain list), and sugary non-diet soda (brand matching; exclude diet/lite/light/zero). BMI status outcome: fraction of participants with BMI > 25 (overweight/obesity). Classification relied on binary classifiers using normalized brand/description text; precision assessed via manual review of 50 random items per category (recall not measured). Aggregation handled clustering by aggregating within person-day, across days per participant, then across participants within zip code. Food environment measures: Grocery access defined as fraction of population >0.5 miles from a grocery store (selected based on strongest correlation with F&V even in rural zip codes; contrasted with USDA’s 10-mile rural threshold). Aggregated census tract–level measures to zip codes using HUD USPS Crosswalk weighted by population. Food desert status used for validation. Fast food access measured as fraction of restaurants that are fast food within up to 1,000 nearest Yelp businesses from zip code center, max radius 40 km; effective sample radius varied with urbanicity. Validation and convergent validity: Compared MFP outcomes to BRFSS county-level measures for F&V (FV5SRV, 2011) and BMI (2012), and reproduced USDA/Nielsen Homescan findings on food deserts across categories using proprietary classifiers. Also checked correlations among Mexican food entries, Mexican restaurant share, and Hispanic population share. Statistical analysis: Matching-based approach (one-to-one Genetic Matching with replacement) to estimate independent associations of above-median vs below-median values for each treatment factor (income, educational attainment, grocery access, fast food access) on outcomes. Matching achieved balance on non-treatment covariates (SMD < 0.25 across variables; typical mean SMD ~0.040 overall). Estimated Average Treatment Effect on the Treated (ATT). Subpopulation analyses repeated within zip codes predominantly Black, Hispanic, or non-Hispanic white. Non-parametric bootstrap with 1,000 replications at zip code level used for CIs and p-values; results qualitatively similar with t-tests. Dose-response explored via top vs bottom quartile comparisons (Supplementary). Age and gender not included as covariates due to minimal zipcode-level correlations with outcomes and near-identical results when included (R = 0.95). Discriminant validity tested via null-treatments (e.g., Yelp categories unrelated to food) yielding null effects. Data availability: Zip code–aggregated data and code provided online. Ethics: Conducted per MFP policies and Stanford IRB guidelines.

Key Findings

Validation: MFP F&V consumption correlated with BRFSS F&V (R = 0.63, p < 1e-5); MFP BMI correlated with BRFSS BMI (R = 0.78, p < 1e-5). MFP-based differences between food deserts and non-food deserts reproduced Nielsen Homescan patterns (R = 0.88, p < 0.01). Overall associations (above vs below median; all p < 0.001 unless noted): • Higher grocery store access: +3.4% F&V; −7.6% fast food; −6.4% soda; −2.4% overweight/obesity. • Lower fast food access: +5.3% F&V; −6.2% fast food; −13.3% soda; −1.5% overweight/obesity. • Higher educational attainment: +9.2% F&V; −8.5% fast food; −13.8% soda; −13.1% overweight/obesity. • Higher income: +3.3% F&V; −6.8% fast food; −8.6% soda; +0.6% overweight/obesity (P = 0.006). Effect sizes generally increased when comparing top vs bottom quartiles, suggesting possible dose-response. Subpopulation differences: Predominantly Black zip codes (3.7% of sample): • Higher income associated with lower diet health: −6.5% F&V; +5.5% fast food; +8.1% overweight/obesity; +14.1% soda (not significant, P = 0.061). • Lower fast food access: largest reduction in fast food (−12.0%); but associated with a slight increase in overweight/obesity (+3.1%). • Higher educational attainment: strongest positive F&V difference (+11.2%). • Higher grocery access: +10.2% F&V; −12.6% fast food; −9.0% overweight/obesity; −5.3% soda (not significant, P = 0.060). Predominantly Hispanic zip codes (5.6%): • Higher income: +5.7% F&V; no significant associations for fast food, soda, or overweight/obesity. • Higher educational attainment: +8.9% F&V; −11.9% fast food; −16.5% soda; −13.7% overweight/obesity. • Higher grocery access: +7.4% F&V (more than twice the overall population difference); similar effects to overall for overweight/obesity; fast food access not significantly associated with soda or F&V. Predominantly non-Hispanic white zip codes (78.4%): Patterns similar to overall results. Summary: Across groups, F&V was higher with high grocery access and high education; fast food consumption was lower with most intervention targets except higher income; soda consumption was lowest mostly with lower fast food access (Black and white) or higher education (Hispanic). Education showed the largest relative reduction in overweight/obesity across all groups.

Discussion

Smartphone-based health data can inform population-level diet and obesity research at unprecedented scale and granularity. The study’s findings generally align with prior work on the roles of food access and socioeconomic factors, but highlight educational attainment as the strongest independent correlate of lower overweight/obesity prevalence. Importantly, associations differ markedly by neighborhood racial/ethnic composition, indicating that interventions should be tailored to local contexts and assets. The disparate associations—for example, higher income correlating with less healthful dietary patterns and higher overweight/obesity in predominantly Black zip codes—underscore the need for nuanced, equity-focused strategies. Consistent improvements in F&V with greater grocery access and education, and reductions in fast food and soda with better environments and education, suggest multiple levers for policy. The results support optimizing intervention allocation by subpopulation and geography to maximize impact.

Conclusion

Analyzing 2.3 billion food logs and self-reported BMIs from 1.16 million U.S. MFP users across 9,822 zip codes, the study demonstrates that higher grocery store access, lower fast food access, higher income, and higher educational attainment are independently associated with healthier dietary behaviors and lower overweight/obesity prevalence. However, these associations vary substantially across predominantly Black, Hispanic, and white zip codes, with education consistently showing the strongest association with reduced overweight/obesity. These insights suggest that policies enhancing food access and educational attainment may promote healthier eating, but intervention design and allocation should be tailored to specific subpopulations and locations. Future research should employ longitudinal designs with detailed individual-level data to enable causal inference and refine intervention targeting.

Limitations

The cross-sectional design precludes causal inference; unobserved neighborhood and individual factors may confound associations. The MFP user base is not nationally representative, skewing toward women and higher income; certain regions (e.g., parts of the Midwest, Alaska), majority non-white, and rural zip codes were underrepresented. Dietary measures are based on food entries rather than standardized quantities; recall was not assessed for classifiers (only precision via manual samples). BMI is self-reported. The choice of 0.5-mile threshold for grocery access deviates from USDA’s rural definition, though supported by correlations; bootstrapping was performed at the zip code level. Although age and gender were not included in matching due to minimal zipcode-level correlations, residual confounding is possible. Yelp-derived measures may have coverage limits (API caps, urbanicity variation).

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny