Health and Fitness
The association between urban land use and depressive symptoms in young adulthood: a FinnTwin12 cohort study
Z. Wang, A. M. Whipp, et al.
This groundbreaking study by Zhiyang Wang, Alyce M. Whipp, Marja Heinonen-Guzejev, Maria Foraster, Jordi Júlvez, and Jaakko Kaprio delves into the intricate relationship between urban land use and depressive symptoms among young adults. Findings indicate that while agricultural residential areas in suburban regions correlate with increased depressive symptoms, city centers show no significant impact. This research underscores the complexity of urban environments on mental health.
~3 min • Beginner • English
Introduction
The study addresses how urban land use relates to depressive symptoms among young adults, a growing public health concern with both genetic and environmental determinants. Prior evidence suggests urban environmental profiles, including land use, can influence mental health, yet results are inconsistent and conventional single-index approaches may not capture complex, potentially interacting, and nonlinear effects. The authors hypothesized that complex relationships between multiple land use characteristics and depressive symptoms exist and cannot be adequately quantified by conventional indices. The objectives were to: (a) cluster participants with similar urban land use profiles; (b) assess linear and nonlinear associations between land use and depressive symptoms in young adulthood; and (c) evaluate whether these associations differ across clusters of urban land use environments.
Literature Review
Past research indicates urbanization and land use transformations affect health, including mental health, but findings are mixed. UK Biobank work linked specific urban environmental profiles to mental health via brain structural changes, and Finnish data suggested urban environment variables relate to lower incidence of serious mental illness. Prior studies using land use indices report inconsistent associations with mental health, partly due to limitations of indices that may miss interactions among land use types. Methodological literature recommends multi-exposure, interpretable statistical and machine learning models to address high-dimensional, correlated exposures and small effect sizes. Machine learning has been used to examine urban exposome links with metabolic outcomes (e.g., type-2 diabetes), and simulation studies have compared multi-exposure modeling strategies. However, such multi-exposure approaches are rare in mental health research, motivating the present study.
Methodology
Design and participants: Population-based prospective FinnTwin12 cohort of all Finnish twins born 1983–1987. Twins completed questionnaires at ages ~11–12 (baseline), 14 (first follow-up), 17 (third wave), and young adulthood (wave four, 2004–2012). For this analysis, 1804 individual twins (589 complete twin pairs and 626 singletons) living in urban Finland in 2012 with available General Behavior Inventory (GBI) data in young adulthood and geocoded residence in 2012 were included (mean age at GBI: 24.07 years). Measures: Depressive symptoms were assessed using the 10-item short-form GBI (0–30, higher indicates more symptoms). Validity was examined against DSM-IV major depressive disorder (MDD) via SSAGA interviews in a subset; young-adulthood GBI predicted MDD with AUC 0.8328. Land use exposures were derived by linking 2012 EUREF-FIN residential geocodes to Urban Atlas 2012. Eight land use categories were quantified as percent coverage within 100 m, 300 m, and 500 m buffers (24 exposures): high-density residential, low-density residential, industrial and commercial, infrastructure, urban green, agricultural residential, natural, and water. A land use mix index (Shannon’s Evenness Index; range 0–1) was also calculated for each buffer. Covariates: Demographics included sex, zygosity (MZ/DZ/unknown), parental education (limited, intermediate, high), smoking (never/former/occasional/current), work status (full-time/part-time/irregular/not working), secondary school type (vocational/senior high/none), and age at GBI. Social indicators at 2012 postal-code level were added to account for socioeconomic context: age structure (% >18), education level (% ≥ bachelor’s among ≥16), unemployment rate (% among ages 25–54), and income level (% households in top national income quartile). Analysis: GBI was log-transformed after adding one due to skewness. Correlations among land use exposures were examined. Unsupervised clustering: K-means clustering on the 24 land use percentages identified optimal K via the Silhouette method; two clusters emerged: a suburban-like cluster (Cluster 1) and a city-center-like cluster (Cluster 2). Modeling strategy: Data were split into training (n=1215) and testing (n=589) sets, ensuring twins within pairs were separated and remaining singletons assigned to training; thus, training and testing individuals were unrelated. Two model families were applied to assess multi-exposure relationships: (1) Linear elastic net penalized regression for feature selection and linear effects, with 10-fold cross-validation to select penalization (lambda; alpha tuned), forcing covariates (demographics ± social indicators) into the model without penalty. Two adjustment plans were used: minimally adjusted (demographics) and further adjusted (demographics + social indicators). (2) XGBoost to capture nonlinearities and interactions, with hyperparameters optimized via parallel Bayesian optimization (R ParBayesianOptimization), initial 3000 rounds and early stopping based on moving-average MSE criterion. SHAP values (SHAPforxgboost) were used to interpret feature importance and directionality. Models were run in the full sample and within clusters under both adjustment plans. Performance was evaluated by RMSE in training and testing sets. Sensitivity analyses: Linear mixed models specifying twin pair in the model to estimate within-pair effects (excluding zygosity and parental education that do not vary within pairs), using features selected by elastic net; and post-hoc linear regressions of land use mix index versus log-GBI with minimal and further adjustment, using robust standard errors clustered by family. Statistical significance threshold p<0.05 with 95% CIs.
Key Findings
- Clustering: Two distinct land use environment clusters were identified. Cluster 2 represented more urbanized/city-center areas with higher high-density residential land use; Cluster 1 represented more suburban areas with higher low-density residential land use. Mean GBI scores: overall 4.42 (SD 4.7), Cluster 1: 4.05 (SD 4.4), Cluster 2: 4.67 (SD 4.8); p=0.01 between clusters. Clusters differed in smoking, work status, secondary education, parental education, and all four social indicators. - Elastic net models: • Cluster 1 (suburban): After minimal adjustment, 11 land use exposures were selected. Agricultural residential land use within 100 m had the largest positive coefficient for log-GBI (0.097). With further adjustment including social indicators, 17 exposures were selected; agricultural residential (100 m) remained the largest positive coefficient, attenuated to 0.067. Other selected exposures included commercial and industrial (300 m; 0.084→0.065), infrastructures (300 m; −0.031→−0.029), high-density residential (100 m; 0.089), high-density residential (500 m; 0.046→0.026), low-density residential (500 m; 0.035→0.036), urban green (300 m; 0.081→0.058), urban green (500 m; 0.010), natural (100 m; −0.003), natural (300 m; −0.014), water (300 m; 0.002), water (500 m; 0.020→0.012), agricultural residential (500 m; −0.067→−0.019). • Cluster 2 (city center): No land use exposures remained in the model under either adjustment plan. • Overall sample: Several exposures were selected; notably, low-density residential (100 m) had the same minimal-adjustment coefficient as Cluster 1 (−0.011). - Mixed models (within-pair): In Cluster 1, commercial and industrial (300 m) was significantly and positively associated with log-GBI after minimal adjustment, but attenuated after further adjustment. In the overall model, low-density residential (100 m) remained significantly protective (higher percentage associated with lower GBI) under both adjustment plans. - XGBoost and SHAP: • Cluster 1: Top land use contributors included natural (100 m; most important under further adjustment) and commercial and industrial (300 m). • Cluster 2: Infrastructure (300 m) was consistently the most important exposure in both adjustment plans. • Nonlinear effects were evident. For infrastructure (300 m) in Cluster 2, SHAP dependence suggested a flat SHAP value up to ~10% coverage, then a sharp increase with the effect shifting from negative to positive beyond ~10%, and a slower increase beyond ~20%. - Model performance: RMSEs in training and testing were generally lower than the SDs of log-GBI (overall SD 0.8825; Cluster 1 SD 0.8851; Cluster 2 SD 0.8774), indicating reasonable performance. - Land use mix index: In Cluster 1, the 300 m mix index showed a significant crude positive association with log-GBI (beta 0.51; 95% CI 0.02, 1.01), but associations were not significant after minimal or further adjustment. No significant associations in overall or Cluster 2 models after adjustment.
Discussion
The study demonstrates that the relationship between urban land use and depressive symptoms in young adults is complex, context-dependent, and involves both linear and nonlinear components. Distinct land use environments—suburban versus city-center contexts—show different associations: in suburban settings, multiple land use characteristics were linked to depressive symptoms (e.g., agricultural residential within 100 m with the largest positive effect; some protective associations for infrastructure in certain buffers), whereas in city-center contexts, no individual land use exposure was selected in linear penalized models, despite XGBoost identifying infrastructure (300 m) as the most important nonlinear contributor. Social environmental indicators modified the observed associations, especially in suburban areas, underscoring the importance of socioeconomic context. The findings align with heterogeneous literature on urbanization and depression and suggest that multiple, interrelated land use features, spatial scales (buffers), and nonlinearities contribute to mental health outcomes. The results advocate for holistic, multi-exposure analytic approaches and segmentation by land use context to inform urban planning and mental health interventions. Sensitivity analyses suggest some within-pair (potentially non-shared environmental) effects but also attenuation after further adjustment, indicating potential residual confounding and/or genetic or familial influences.
Conclusion
This work is, to the authors’ knowledge, the first to analyze multiple urban land use exposures jointly in relation to depressive symptoms in young adulthood, revealing both linear and nonlinear associations and marked heterogeneity across suburban and city-center land use contexts. Multi-model, multi-exposure frameworks (elastic net and XGBoost) can prioritize influential land use features beyond single-index approaches. Cluster-based segmentation highlights that effects may be context-specific. Given modest sample size, temporality constraints, and model characteristics, findings should be interpreted cautiously. Future research should employ longitudinal designs, leverage twin and genetic structures more fully, incorporate additional physical exposures (e.g., air pollution, noise), enhance interpretability of machine learning outputs, and explore comprehensive environmental profiles to guide urban planning for mental health.
Limitations
- Temporality: Depressive symptom assessments preceded the 2012 land use exposures, limiting causal inference and directionality. - Exposure stability: Lack of life-course exposure assessment may affect accuracy of exposure timing and duration. - Sample size: Relatively modest for high-dimensional analyses; risk of overfitting mitigated but not eliminated. - Limited use of twin design: Genetic influences were not fully disentangled; mixed models explored within-pair effects but comprehensive twin modeling was not applied. - Potential residual confounding: Other environmental exposures (air pollution, noise) not directly modeled may confound associations. - Interpretability: Machine learning models revealed nonlinear patterns but detailed mechanistic interpretation remains challenging.
Related Publications
Explore these studies to deepen your understanding of the subject.

