logo
ResearchBunny Logo
Visual attention predictive model of built colonial heritage based on visual behaviour and subjective evaluation

Architecture

Visual attention predictive model of built colonial heritage based on visual behaviour and subjective evaluation

Y. Wu, N. Li, et al.

This fascinating study explores how our visual behavior relates to how we evaluate colonial heritage architecture. Through eye-tracking data from 54 participants, researchers have established a prediction model that identifies levels of visual attention. Conducted by Yue Wu, Na Li, Lei Xia, Shanshan Zhang, Fangfang Liu, and Miao Wang, this research not only highlights the links between eye movements and urban perception but also aids architects in conservation strategies.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses how people perceive and visually attend to built colonial heritage, a key component of cultural heritage often under-protected and under-studied in rural contexts. It seeks to quantify and classify visual attention by linking objective eye-movement behaviour with subjective evaluations. Given limited generalisability of prior work focused on urban heritage or natural landscapes, and the potential non-linearity in relationships between attention, behaviour, and appraisal, the authors propose using a BP neural network. Research aims: (1) compare visual behaviours and subjective evaluations across different scenes of built colonial heritage; (2) test associations between visual attention and fixation/pupil metrics, with the expectation that attention rises with longer gazes on unique elements; (3) test associations between visual attention and subjective indicators of architectural features, environmental atmosphere, and spatial perception; (4) evaluate whether a combined model using eye-movement and subjective metrics predicts attention levels more accurately than either alone.
Literature Review
Visual attention spans multiple fields: biology investigates neural mechanisms; medicine studies ocular physiology and disorders; psychology examines reading and scene perception; computer science models attention for image processing; and landscape visual evaluation links landscape features to human responses. Attention involves bottom-up and top-down mechanisms; overt and covert attention typically co-occur in scene viewing. Eye tracking provides objective measures (fixations, saccades, pupil) reflecting attention allocation. In built environment research, key visual characteristics include disturbance, historicity, visual scale, complexity, and naturalness. Architectural features (style, form, dimension, decoration, texture, colour) influence perception, with traditional forms often preferred. Eye tracking has been applied to light environments and, increasingly, visual attention. Specific façade and scene attributes (balance, window/door presence, number of buildings, signage colour, weather) influence visual behaviour. Buildings, sky, vegetation attract attention; high-quality and unique elements garner longer observation. Studies show eye-movement differences across landscape types and urbanisation levels, and often positive relations between eye metrics and preference; regression models have been developed for visual attention. However, eye tracking cannot fully replace subjective evaluation, and differences when viewing architectural heritage across scenes remain underexplored. Advances in machine learning, notably BP neural networks, enable non-linear modelling and have been used to assess perceptual qualities. Gaps include focus on urban or natural landscapes rather than rural built-colonial contexts, and limited integration of objective eye-tracking with subjective appraisal in predictive models.
Methodology
Study area: Zhenbei Village, Yimianpo Town, Shangzhi City, Heilongjiang Province (along the Chinese Eastern Railway), a historic rural settlement with numerous well-preserved Russian-style buildings (constructed mainly 1903–1904). Sixteen samples (photos) were selected spanning five scene types: traditional dwellings (4), public buildings (3), streets (3), public spaces (3), and courtyards (3). Dwellings include single-, double-, and multi-family; structures include wood-veneer and brick-wood. Stimuli: 200 photographs taken under consistent technique and weather; 16 selected at 2560×1600 px, 300 dpi. Design: Combined eye-tracking experiment (free viewing) and Semantic Differential (SD) questionnaire. Visual attention rating on a 10-point scale (1–10). Subjective evaluation on seven-point Likert (−3 to 3) using 13 adjective pairs across three dimensions: Architectural features (Monotonous–Rich; Disproportionate–Proportionate; Inappropriate size–Appropriate size; Roughly Decorated–Exquisitely Decorated; Plain–Gorgeous); Environmental atmosphere (Simple–Complex; Dirty environment–Clean environment; Traditional–Modern); Spatial perception (Weak cultural atmosphere–Strong cultural atmosphere; Little vegetated–Much vegetated; Closed–Open; Unpleasant–Pleasant; Unattractive–Attractive). Reliability: Cronbach’s alpha = 0.953. Participants: College students from Harbin Institute of Technology. Pretest: n=7. Formal experiment: n=90 randomized into prediction and test groups; after data quality screening (≥80% sampling rate; calibration accuracy/precision ≤0.65°), 54 participants remained in the prediction group and 24 in the test group. Groups balanced by gender, age, education; normal/corrected vision ≥1.0, normal colour vision; naive to images. Apparatus and setting: Tobii Pro Fusion screen-based eye tracker (250 Hz) on a 16.1" laptop (2560×1600 dpi) in a quiet, windowless seminar room with artificial lighting. Distance: 60–65 cm; PCCR with head-movement compensation. Procedure: Calibration; 16 images presented automatically for 10 s each with 2 s inter-stimulus blank; total eye-tracking ~4 min; free viewing (no task). Afterward, participants completed the visual attention and SD questionnaires, first browsing all images; image order counterbalanced across three sequences. Data exclusion if interruptions occurred (e.g., sneeze/cough). Measures: Eye-movement metrics (overall scene level over 10 s): Fixation metrics—TFD (s), AFD (s), FC (count), TFF (s), FFD (s). Saccade metrics—ASA (°), SC (count), ASPV (°/s). Pupil metric—APD (mm). Rationale: Longer TFD/AFD indicate higher processing load or attraction; higher FC may reflect attention or processing difficulty; shorter TFF indicates fast attraction; FFD reflects initial interest; larger SC indicates longer search; higher ASPV indicates fast browsing with fewer interesting elements; larger APD relates to interest. Analyses: Eye data processed in Tobii Pro Lab; statistics in SPSS 25.0. Normality by Shapiro–Wilk; homogeneity as appropriate. One-way ANOVA (p<0.05 as significant); for visual attention, Brown–Forsythe and Welch ANOVA with Bonferroni post hoc where applicable. Pearson correlations among eye metrics, subjective indicators, and visual attention; additional element-level TFD correlations (e.g., roofs, chimneys, windows, text, ground). Predictive modelling: BP Neural Network for three-class classification of visual attention (Low=1–4; Middle=5–7; High=8–10). Inputs: variables significantly correlated with visual attention—3 eye metrics (ASPV, ASA, APD) and 12 subjective indicators (all except “Traditional–Modern”). Architecture: 3-layer network (input, one hidden, output), sigmoid activation; hidden nodes determined via Nhid = Nin + Nout + a, a∈[1,10], chosen by trial for best accuracy; training/test split 75%/25%. Model building on prediction group (n=54); generalisability tested on independent test group (n=24). Accuracy computed as proportion correctly classified; additional group models explored by scene type; independent-samples t-tests compared accuracies between general and test models.
Key Findings
- Visual behaviour differences across scenes: Significant differences found in fixation and saccade/pupil metrics. Fixations: TFD (F=6.019, p=0.008), AFD (F=6.855, p=0.005). Saccade/pupil: SC (F=3.400, p=0.048), ASPV (F=5.167, p=0.014), APD (F=5.851, p=0.009). Traditional dwellings showed the longest TFD (mean 7.65 s) and largest SC, ASPV, and APD (22.00 counts; 171.22°/s; 2.89 mm). Public spaces had the smallest ASPV and APD (150.92°/s; 2.68 mm). AFD longest on streets (0.33 s) and shortest in courtyards (0.25 s). Mean ASA for traditional dwellings was 4.70°. Gaze plots showed initial fixations near the upper centre; attention to architectural details (windows, chimneys) in dwellings; central focus in symmetrical public buildings; vanishing points in streets; greenery observed later in courtyards. - Visual attention ratings by scene: Means (highest to lowest): traditional dwellings 7.11; public buildings 7.00; streets 6.71; courtyards 6.40; public spaces 5.39. ANOVA indicated significant differences (F=27.164, p<0.001). - Subjective evaluation differences: Significant differences across scenes (p<0.001). Traditional dwellings and public buildings scored higher overall; decorative refinement and colourfulness differed significantly (p<0.001) compared to streets, courtyards, and public spaces. - Correlations: Visual attention negatively correlated with ASPV and ASA and positively with APD. Visual attention positively correlated with 12 subjective indicators (all except Traditional–Modern). Architectural features—Richness, Proportionate, Appropriate size correlated positively with SC and negatively with ASPV; Proportionate and Appropriate size positively with TFD; Appropriate size negatively with ASA. Gorgeous colour negatively correlated with TFD, AFD, ASPV and positively with FC. Environmental complexity correlated with multiple eye metrics (more areas of interest, higher SC, lower load). Greenery coverage positively correlated with FFD and ASPV. Spatial openness positively correlated with TFD and negatively with ASPV; Pleasantness negatively with ASPV and ASA. - Element-level correlations with visual attention: TFD on chimneys and windows positively correlated; TFD on roofs negatively correlated. TFD on text signs and ground negatively correlated—suggesting inappropriate signage and large ground areas reduce attention. - Predictive modelling: General model integrating eye-movement and subjective indicators achieved the highest accuracy (reported as 74.46% overall); group models showed improved prediction for low-attention scenes, with an integrated model accuracy reported at 68.27% in that context; no significant accuracy difference between group and general models (F=0.000, t=1.075, p>0.05). External validation on the independent test sample (n=24) showed: Eye tracking only—overall 59.47% (SD 0.05); class accuracies L 0.00%, M 95.05%, H 19.08%. Subjective only—overall 68.00% (0.02); L 53.43%, M 84.40%, H 39.51%. Combined—overall 70.21% (0.03); L 50.74%, M 86.43%, H 48.00%. No significant difference between general and test model accuracies (F=1.732, t=0.278, p>0.05), indicating generalisability.
Discussion
Findings confirm that visual attention to built colonial heritage varies by scene and is systematically linked to specific eye-movement and subjective evaluation indicators. Negative correlations of visual attention with ASPV and ASA, and positive with APD, indicate that higher attention corresponds to slower, more targeted saccadic behaviour and heightened interest (larger pupil); this aligns with literature showing rapid, long saccades in monotonous scenes and the interest value reflected in pupil dilation. Public spaces, typically more open and less architecturally distinctive in this rural context, elicited lower attention and reduced exploration, while traditional dwellings and public buildings, rich in distinctive details and colours, drew stronger attention and higher ratings. Subjective evaluations consistent with Gestalt principles (proportion, order, symmetry) and environmental qualities (complexity, cleanliness, greenery, openness) were positively associated with visual attention, underscoring the importance of both architectural form and environmental context. The integrated BP neural network capitalises on the complementarity of objective eye metrics and subjective assessments, outperforming single-source models and offering robust classification of low, middle, and high attention levels. Practically, the model can identify scene types and elements that depress attention (e.g., excessive ground area, inappropriate signage) and guide conservation and renewal strategies to enhance attention—emphasising proportion, appropriate scale, refined decoration, colour logic, environmental cleanliness, greenery, and spatial openness.
Conclusion
The study develops and validates a quantitative method to distinguish and quantify visual attention to built colonial heritage by integrating eye-movement behaviour and subjective evaluation. Contributions: (1) Demonstrated significant differences in visual behaviour across scene types in fixation and saccade/pupil metrics; (2) Established that subjective evaluations vary by scene, with traditional dwellings and public buildings scoring higher in proportion, decorative refinement, cultural atmosphere, and attractiveness; (3) Identified key associations between attention and eye metrics (ASPV, ASA, APD) as well as 12 subjective indicators (richness, proportion, appropriate scale, refinement, colourfulness, complexity, cleanliness, cultural atmosphere, greenery, openness, pleasantness, attractiveness); (4) Built a BP neural network classifier that achieved the highest accuracy when combining eye-tracking and subjective indicators (overall ~74.46%), surpassing models using either input alone, and generalised to an independent participant sample. The approach can inform assessment of unprotected and renovated heritage and new construction around heritage to improve visual attention in colonial contexts. Future research should expand subjective evaluation dimensions, include diverse regions and environmental conditions (season, weather), leverage immersive or real-world stimuli, incorporate neural measures (EEG/fMRI) to probe aesthetic processes, and recruit more demographically diverse samples to compare local versus colonial cultural perceptions and expert versus non-expert differences.
Limitations
- Subjective evaluation dimensions were limited to three domains; expanding indicators could yield richer insights. - Photographs cannot fully substitute for in-situ perception; image capture choices may bias results; seasonal and weather variations were not considered despite the extreme-climate context. - The sample focused on one region with relatively homogeneous architectural features; cross-regional comparisons are needed. - The study did not conclusively link aesthetic properties to specific eye movements/attention due to the complex, partly philosophical nature of aesthetics; future work should integrate EEG/fMRI. - Participant pool (students) limits generalisability; larger, more diverse samples are needed, including comparisons between architectural experts and non-experts and between local and colonial cultural backgrounds.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny