logo
ResearchBunny Logo
Al perceives like a local: predicting citizen deprivation perception using satellite imagery

Environmental Studies and Forestry

Al perceives like a local: predicting citizen deprivation perception using satellite imagery

A. Abascal, S. Vanhuysse, et al.

This innovative research by Angela Abascal and colleagues explores how satellite imagery and AI can predict citizen perceptions of deprivation in urban settings. By leveraging deep learning and citizen science, the team effectively prioritizes urban needs and guides policy implementation for sustainable development.... show more
Introduction

Urban inequality poses a major social challenge, with slums in low- and middle-income countries exhibiting high levels of physical deprivation and related risks (e.g., energy poverty, heat exposure). EO combined with AI/ML has advanced mapping of urban appearance and elements in slums, yet model accuracy and global datasets remain limited in high-deprivation contexts due to variability across cities, lack of reliable reference data, and underrepresentation of deprived areas in training sets. Street-level imagery used in perception studies is sparse in LMIC slums due to access constraints, creating a geodata gap. This study explores integrating EO data, citizen science, and AI to assess perceived deprivation, addressing three research questions: (i) Can satellite imagery capture citizen-perceived physical deprivation? (ii) Can AI predict citizens' deprivation perception from satellite imagery? (iii) Which physical features most influence perceived deprivation? The paper presents results and discussion, followed by data, study area, and methods.

Literature Review

Prior work shows EO's broad coverage and increasing resolution enable cost-effective urban analysis. The adoption of ML/DL and transfer learning has produced global layers (built-up areas, urban extents, building footprints). However, accuracy is lower in deprived areas due to limited in situ data and variable manifestations of deprivation. Citizen science has assessed urban qualities (safety, cleanliness, livability, wealth) primarily via street-level images in high-income contexts, with limited application in LMICs due to low street-view coverage in slums. Image rating approaches can be uncertain when differences are subtle; pairwise comparisons offer improved consistency. The integration of EO imagery into citizen perception assessments remains largely unexplored, motivating this study that links EO, citizen science, and AI to capture perceived deprivation in slums and identify influential environmental features.

Methodology

Study area and data: The study covers all slum areas of Nairobi (~20 km²), partitioned into 1 ha grid cells (1998 subsets of 100 m × 100 m). Datasets: (1) WorldView-3 (WV3) very-high-resolution imagery (panchromatic 0.30 m; 8-band multispectral 1.20 m; orthorectified and radiometrically calibrated; mosaic dates 13/01/2019 and 01/02/2019); (2) land-cover classification derived from WV3; (3) OpenStreetMap roads; (4) OSM rivers; (5) Google Open Buildings. DL used WV3 bands; ML used features derived from all five datasets.

Citizen science and deprivation scoring: A mobile web platform displayed random pairs of 100 m × 100 m satellite subsets without location/context. Participants (n=186 across seven slums) voted on "Which is the best place to live?" with no tie option. Over 1,089,302 pairwise votes were recorded, 629,027 unique comparisons; duplicates supported consistency checks. Two metrics were computed: comparison consistency (per-individual consistency across repeated identical comparisons) and group agreement (consensus across citizens for the same comparison). An individual divergence metric quantified deviation between a participant's choices and group agreement; high-divergence outliers (five participants) were excluded. Votes were converted to a continuous deprivation perception score via TrueSkill (Bayesian rating), modeling each subset as N(μ, σ²) within a free-for-all match scheme. Scores were normalised to 0–1, where higher denotes "best place to live" (least deprived), producing a deprivation map at 100 m resolution.

Deep learning: Two CNN families were tested. (1) A custom VGG-9-like network trained from scratch: three convolutional blocks (32, 64, 128 filters), ReLU + batch norm, max-pooling per block; two dense layers (256 units); linear output (regression); Adam (lr=0.01), 400 epochs, MSE loss, batch size 128. (2) DenseNet-121 both trained from scratch and via transfer learning (pretrained on ImageNet). Architecture extended with a 1024-unit dense layer and regression head; Adam optimizer. Scratch: lr=0.001, 200 epochs. Transfer learning: stage 1 freeze backbone, train head 15 epochs (lr=0.001); stage 2 unfreeze and fine-tune 200 epochs (lr=0.0001). Inputs were 333×333 px (1 ha at 0.3 m) resampled to 224×224, standardised (zero mean, unit variance). Data augmentation used vertical flips, up to 20% rotations, and 5% translations (reflection fill). Band combinations: RGB and RGNir.

Evaluation protocol: To mitigate data scarcity and overfitting, 10-fold cross-validation with 90/10 splits was used so each image appears in a validation fold at least once. Each experiment (ML and DL) was repeated ten times per fold; the highest score, mean, SD, RMSE, and R² were reported for training (T) and validation (V) folds.

Conventional ML: Pipeline included data processing, feature extraction, and two-step feature selection: (i) Lasso regularisation (glmnet) to zero-out uninformative coefficients with CV-tuned penalty; (ii) remove multicollinearity by discarding one of any pair with |Pearson r|>0.8 (prioritising features with larger Lasso coefficients and using means over medians). Features were created per 100 m cell: land-cover fractions (buildings, ground, vegetation, water, shadows, vehicles, waste), spectral indices/stats per class (notably roofs: redness/greenness/blueness means/ranges/SD), morphometrics from building footprints (area ratio, proximity, neighbour counts), densities for roads/main roads (from OSM), presence of rivers, etc. Two transformed feature sets were tested: standardised and log-transformed. Regressors: SVM (RBF, linear, polynomial), Random Forest (RF), and XGBoost, tuned via caret (tuneLength). Feature importance was analysed using RF permutation/impurity-based importance for the best-performing experiment.

Key Findings
  • Citizen science feasibility and reliability: 1,089,302 pairwise votes (629,027 unique) were collected from 186 participants. High comparison consistency and group agreement were observed; only five participants with high individual divergence were excluded. TrueSkill produced a continuous deprivation perception score (0–1), mapping intra-slum spatial variation at 100 m.
  • Deep learning performance: Transfer learning with DenseNet-121 (pretrained, RGB) outperformed all other models. Validation metrics: V max R² = 0.841; V mean R² = 0.801; V mean RMSE = 0.767; V sd R² = 0.021. The pretrained RGNir variant had V mean R² = 0.789. Models trained from scratch performed lower (e.g., DenseNet-121 scratch RGB V mean R² = 0.577; RGNir V mean R² = 0.627; VGG-9 scratch RGB V mean R² = 0.487). Training metrics were high with low variance for the pretrained DenseNet-121 (T mean R² = 0.903; T mean RMSE = 0.537), indicating stable convergence without overfitting.
  • Conventional ML performance: Across nine experiments with different feature sets and transforms, best validation performances were achieved by RF or SVM with RBF kernel. The top experiment (I - RF LOG) achieved V max R² = 0.707; V mean R² = 0.667; V mean RMSE = 1.010; V sd R² = 0.027. Some feature sets performed poorly (e.g., E/F with SVM rad STD V mean R² ~0.10–0.15), underscoring the role of feature engineering and transforms.
  • Spatial patterns and residuals: The best DL model reproduced overall geographic patterns of deprivation; residuals (citizen minus predicted) were typically within ±0.1 with no spatial autocorrelation (Moran's I not significant). Largest over- or under-predictions were associated with specific environmental elements (e.g., very small buildings, rivers, small waste sites).
  • Interpretable features: RF feature importance highlighted relative road density as the most influential predictor, followed by building fraction and ground surface. Other strong predictors included river presence, building area ratio, roof color metrics (visible greenness/blueness/redness), water fraction, waste piles, shadows, and building proximity/neighbour metrics. These align with community perspectives that better street connectivity, lower built-up density, and presence of open space indicate lower deprivation.
Discussion

The study shows that satellite EO imagery can substitute for street-level imagery to capture citizens’ perceived physical deprivation in slums. AI models, particularly transfer-learned DL, accurately predict citizen-derived deprivation scores, while conventional ML, though slightly less accurate, provides interpretability to identify key environmental drivers of perception (road density, building density, open ground, rivers, roof materials/colors, waste). These findings validate that perceived deprivation is strongly tied to observable physical features detectable from VHR imagery. The approach bridges EO state-of-the-art with citizen science, enabling quantification of perceptions relevant to policy and SDG-11 monitoring. Considerations include addressing the digital divide to ensure inclusive participation and acknowledging contextual subjectivity in perceptions; nonetheless, objective AI performance supports the robustness of the derived scores. The method can inform targeted interventions, resource allocation, and monitoring by highlighting spatial patterns of perceived deprivation, with potential for scaling via transfer learning and domain adaptation.

Conclusion

This work demonstrates that: (1) satellite imagery effectively captures aspects of deprivation as perceived by slum residents; (2) AI, especially transfer learning with DenseNet-121, can accurately predict citizen-derived deprivation scores; and (3) conventional ML reveals interpretable physical features driving perceptions (e.g., road density, building/building roof characteristics, rivers, waste). The resulting deprivation score maps provide actionable insights for policymakers to design citizen-centred urban upgrading strategies aligned with SDG-11. Future research will extend to multiple cities to assess replicability and transferability, benchmark additional pretrained models, integrate domain adaptation, and leverage open EO datasets and building footprints to reduce data costs while maintaining accuracy and update frequency.

Limitations
  • Geographical scope: Results are from a single city (Nairobi), limiting generalisability; transferability to other cities may be challenging due to context-specific manifestations of deprivation.
  • Sampling: Participants were not randomly sampled; potential biases in deprivation perception may persist despite consistency checks and exclusion of high-divergence outliers.
  • Data size and cost: Limited labeled samples (1998) and dependency on very-high-resolution imagery (WV3) constrain scalability due to acquisition costs for large areas and frequent updates.
  • Interpretability: DL models, while accurate, lack intrinsic interpretability; reliance on conventional ML for feature importance may miss interactions captured by DL.
  • Connectivity constraints: Participation requires smartphones and internet access; limited connectivity in slums can affect inclusivity.
  • Band limitations: Pretrained models on RGB may not fully exploit NIR information; adapting pretrained weights to multispectral inputs remains nontrivial.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny