
Interdisciplinary Studies
Explainable dimensionality reduction (XDR) to unbox AI ‘black box’ models: A study of AI perspectives on the ethnic styles of village dwellings
X. Li, D. Chen, et al.
Discover a revolutionary explainable Dimensionality Reduction (XDR) framework that transforms high-dimensional AI knowledge into clear insights! With a compelling case study on ethnic styles of village dwellings in Guangdong, China, led by Xun Li, Dongsheng Chen, Weipan Xu, Haohui Chen, Junjun Li, and Fan Mo, this research highlights pivotal features that enhance our understanding of culture and architecture.
~3 min • Beginner • English
Introduction
The study addresses the challenge that state-of-the-art AI systems learn largely tacit, uninterpretable knowledge, creating ethical and technical risks in real-world deployment. Explainable AI (XAI) seeks to make model reasoning transparent and to convert tacit model behavior into explicit, human-understandable knowledge. Existing unsupervised dimensionality reduction methods (e.g., PCA, ICA) used to interpret deep features mix latent factors, lack quantitative links to domain concepts, and can obscure multiple distinct types when leading components dominate variance. This work proposes an explainable Dimensionality Reduction (XDR) framework that infuses domain knowledge into model interpretation to translate high-dimensional deep features into explicit, architectural semantics. The research question is whether AI models can discriminate ethnic styles of rural dwellings (Canton, Hakka, Teochew, and mixed) from satellite imagery, and which spatial features drive these distinctions. The purpose is to unbox a Mask R-CNN building segmentation model’s features, align them with architectural typologies, and map ethnic style distributions and proximities at scale, thereby confirming and extending knowledge in architectural and historical geography.
Literature Review
The paper situates itself within XAI methods including dimensionality reduction, feature importance, attention mechanisms, knowledge distillation, and surrogate modeling. Traditional unsupervised approaches (PCA, ICA) help visualize deep features but do not quantitatively map neural features to domain semantics and can conflate distinct knowledge components. Tools like LIME and SHAP enable local interpretability of classifiers, but there is limited evidence that XAI methods yield new domain knowledge. Prior research in architectural history and human geography has richly documented typologies and cultural-historical evolution of traditional dwellings in Guangdong (Canton, Hakka, Teochew) through fieldwork, yet systematic, large-scale annotation and mapping remain scarce. The authors argue for supervised, domain knowledge-infused dimensionality reduction to achieve interpretable, quantifiable links between deep features and architectural attributes, enabling both validation of expert knowledge and discovery of new insights (e.g., mixed styles, migration evidence).
Methodology
Study area and data: Guangdong Province, China, characterized by cultural diversity shaped by historical migration, with established architectural knowledge for Canton, Hakka, and Teochew dwellings. Data include high-resolution RGB satellite imagery (MapQuest, 0.3 m/pixel) and POIs to locate traditional villages.
Pre-condition (black-box model): A Mask R-CNN building segmentation model (ResNet + FPN backbone) pre-trained on COCO (~1.5M instances) and fine-tuned with >10,000 manually annotated building footprints across Guangdong (validated in prior work). Outputs include building instances, bounding boxes, and classes; this study uses bounding boxes.
XDR framework steps:
1) Pyramid Layer Selection: From FPN layers (P2–P5), select P3 using the size-based rule k = k0 + log2(sqrt(wh)/224), with k0 = 4 (Lin et al., 2017). P3 balances spatial resolution and semantic richness for typical building sizes.
2) Building- and Village-Scale Feature Extraction: Extract the 256-channel P3 feature maps over the image, then crop per building bounding box to obtain building-scale features. Aggregate per building via global average pooling to X_house, then aggregate across all buildings in a village via mean pooling to obtain village-scale features X_village. This focuses on building features and reduces contamination from non-building land cover.
3) Infusion of Domain Knowledge: Domain experts define 10 village styles (nine historical subtypes across Canton, Hakka, Teochew, plus Modern). Experts label 7–9 images per style (84 total). Train an XGBoost model on X_village to predict village type with a 70/30 train-test split, achieving 97% test accuracy. Apply SHAP to quantify feature importance for class differentiation. For robustness, use random sampling: per run, select 60% of samples per type for training; repeat 500 times and average SHAP values to rank feature maps by importance. Visualize top features and infer architectural semantics (e.g., patio, size, length, direction/asymmetry) by overlaying activation maps on imagery. Select the top n=11 semantic features M_semantics from the 256 channels and aggregate to village-scale X_semantics.
4) Proximity Evaluation: Compute pairwise cosine similarity between villages using X_semantics, construct a similarity graph, cluster with k-means, and visualize in Gephi. Compare proximity relationships and geographic distributions with known migration records to validate findings.
Ablation: Compare five settings for village networks: (1) image-scale 256-channel features; (2) PCA (11 PCs) of image-scale features; (3) village-scale 256-channel features (building-pooled); (4) PCA (11 PCs) of village-scale features; (5) XDR’s 11 semantic features at village-scale. Assess cluster structure clarity and semantic interpretability.
Key Findings
- The XGBoost classifier on village-scale features achieved 97% accuracy on the hold-out test set for 10 styles (9 historical + modern).
- XDR identified 11 prominent feature maps (M_semantics = {M209, M81, M70, M89, M31, M125, M79, M176, M37, M149, M193}). After aggregation: X_semantics = {X209, X81, X70, X89, X31, X125, X79, X176, X37, X149, X193}.
- Architectural semantics: M89 is highly sensitive to patios; other key features capture size, length, direction, and asymmetry. These correspond to domain-relevant attributes for distinguishing Canton, Hakka, Teochew, and mixed styles.
- Clustering and proximity: Using X_semantics, villages grouped into eight clusters corresponding to domain-labeled styles: Hakka (Shaoguan-Qingyuan type), Hakka (Meizhou type), Teochew, Canton, Canton-Hakka mixed, Canton-Teochew mixed, Hakka-Teochew mixed, and Modern. Mixed clusters naturally lie between single-ethnicity clusters, indicating style blending.
- Geographic distribution (Voronoi visualization): Canton villages dominate central/western Guangdong; Hakka (Meizhou type) in the east (Meizhou, Heyuan); Hakka (Shaoguan-Qingyuan type) in Shaoguan and Qingyuan; Teochew in the eastern coastal cities (Shantou, Jieyang). Mixed styles are distributed across transition areas (e.g., Hakka-Teochew in Chaozhou; Canton-Teochew in Shanwei; Canton-Hakka broadly around Canton areas).
- New historical insight: Field validation in Yuechang Village (Zengcheng, Guangzhou) identified a stele documenting the Pan clan migration from Xinfeng County (Shaoguan) ~200 years ago, aligning with the fourth Hakka migration. Satellite imagery shows Canton-Hakka mixed styles in migratory villages (Yuechang, Dongdong), confirming XDR’s discovery and complementing historical geography.
- Ablation: XDR’s village network provides the clearest, semantically interpretable clusters. PCA-based approaches yield structure but lack semantic interpretability; building-scale aggregation improves robustness by excluding non-building features. PCA with building-scale features shows clusters but misplaces Canton relative to Teochew and Hakka compared to known geography.
Discussion
The study demonstrates that infusing domain knowledge into dimensionality reduction enables translation of deep, tacit features into explicit architectural semantics. The identified patio, size, length, direction, and asymmetry features directly address the research question by explaining how the Mask R-CNN model distinguishes ethnic dwelling styles. The proximity networks and geographic distributions derived from X_semantics align with established field knowledge, validating the approach, and also reveal mixed-style villages that reflect cultural integration processes. The migration case substantiates that XDR can surface historically meaningful patterns, such as undocumented or less-documented migration influences, thus extending domain knowledge.
Significance: XDR bridges AI interpretability and domain scholarship by yielding human-readable features and large-scale, low-cost mapping of cultural typologies from satellite imagery. It confirms expert knowledge (e.g., Hakka shapes, patio variants, size ranges, symmetry) and discovers new knowledge (mixed-style areas, migration traces). Methodologically, combining SHAP with feature selection on building-focused deep features ensures both predictive performance and interpretability. The framework’s few-shot labeling requirement is practical for human geography, where labeled data are scarce.
Conclusion
This paper introduces XDR, a supervised, domain knowledge-infused dimensionality reduction framework that explains deep features from a building segmentation model and maps ethnic dwelling styles in Guangdong. Main contributions: (1) an interpretable pipeline from FPN features to village-level semantic features via SHAP-guided selection; (2) confirmation that patio, size, length, direction, and asymmetry are key determinants of Canton, Hakka, Teochew, and mixed styles; (3) proximity relationships and geographic distributions consistent with field studies; and (4) discovery of evidence for the fourth Hakka migration influencing mixed-style villages.
Future work: enhance robustness by integrating multiple top-ranked features per class (e.g., top-3) to retain secondary cues; mitigate dependence on bounding box quality; scale to other regions and cultural contexts; and incorporate additional modalities (historical records, higher-resolution imagery) to enrich semantic mappings and migration inference.
Limitations
- Dependence on Mask R-CNN outputs: Inaccurate or missing building bounding boxes reduce the reliability of feature extraction and downstream XDR analysis.
- Loss of secondary information: Current XDR emphasizes the most important features per category, potentially omitting secondary/tertiary cues that influence style differentiation.
- Limited labeled samples: Although few-shot friendly, the 84 labeled examples constrain coverage of stylistic variability; broader labeling could refine semantic feature selection and clustering stability.
- PCA baselines lack semantic interpretability; while XDR addresses this, comparisons indicate some sensitivity to feature selection and pooling strategies.
Related Publications
Explore these studies to deepen your understanding of the subject.