
Psychology
Tracking historical changes in perceived trustworthiness in Western Europe using machine learning analyses of facial cues in paintings
L. Safra, C. Chevallier, et al.
This intriguing study by Lou Safra, Coralie Chevallier, Julie Grèzes, and Nicolas Baumard delves into the historical rise of perceived trustworthiness in Western Europe from 1500 to 2000. By harnessing the power of machine learning to analyze facial expressions in historical portraits, the researchers reveal a fascinating upward trend in trustworthiness, suggesting a possible connection to improving living standards.
~3 min • Beginner • English
Introduction
The study investigates whether social trust rose in Europe from the early modern period onward and how such changes can be quantitatively documented despite the lack of direct measures from past populations. Cultural artifacts, particularly portraits, are proposed as cognitive fossils reflecting historical mentalities. Prior observations include increases in religious tolerance, declines in witch hunts and honor killings, and rising intellectual freedom. The authors propose that specific facial features used by humans to judge trustworthiness (e.g., smiles, eye width) are embedded in portraits and can be quantified over time. They aim to build and validate an algorithm that models human-like ratings of perceived trustworthiness from facial action units and to use these estimates to track historical trends and test associations with societal factors such as affluence and democratization.
Literature Review
Prior work shows that preferences for friendly-looking, neotenous features influence portraiture and that direct-gaze portraits are more popular, reflecting broader social-cognitive preferences. Research in social perception documents consistent facial cues for perceived trustworthiness across individuals and cultures, including smiling and eye features, and establishes links between perceived trustworthiness and perceived dominance. Studies also show that first impressions vary with emotion, gender, age, and head orientation. Broader historical and cultural analyses describe shifts in prosocial displays and values, such as the 'Smile Revolution' and rises in liberal values (tolerance, political freedom, democracy). These literatures ground the use of facial cues in portraits as indicators of perceived trustworthiness and, by extension, as a proxy for social trust.
Methodology
Algorithm construction: The authors used OpenFace (v1.01, OpenCV 3.3.0) to detect facial action units (AUs) on faces and trained models to estimate perceived trustworthiness and dominance. Training data consisted of five sets of FaceGen avatars manipulated along validated dimensions of perceived trustworthiness and dominance (based on Oosterhof and Todorov). About 3% of avatars were excluded for poor detection. Data were split into training (80%) and test (20%) sets, stratified by avatar set. Model selection compared linear models, random forests (RandomForest R package), and support vector machines (kernlab) via repeated 20-fold cross-validation with random hyperparameter search (caret). Random forest performed best; the optimal mtry was 9. Test-set performance was high (trustworthiness r ≈ 0.85, t(75)=14.17, p<0.001; dominance r ≈ 0.86, t(75)=14.72, p<0.001).
Validation: Predictions were evaluated on natural-face databases with human ratings: Karolinska (N=70), Oslo (N=185), Chicago (N=520), FEI (N=520; no subjective ratings). Modeled estimates correlated with participants’ ratings (trustworthiness r=0.22, p<0.001; dominance r=0.16, p<0.001; N=768 using neutral, frontal faces). The algorithm reproduced known effects: gender (females more trustworthy, less dominant), emotion (happy more trustworthy; angry more dominant), head-orientation stability, and age (older less trustworthy, more dominant). External validation on Google Images portraits (N=633) showed higher perceived trustworthiness for women vs men (t(632)=7.89, p<0.001) and lower perceived dominance (t(632)=-11.79, p<0.001). The algorithm is intended to model human perceptions, not actual traits.
Historical datasets and processing: National Portrait Gallery (NPG; analyzed N=1962 English portraits, 1505–2016) and Web Gallery of Art (WGA; N=4106 portraits across 19 Western European countries, 1360–1918) were processed with OpenFace. Face-detection fit quality was rated by independent coders and used as analytic weights; sitter gender and age were coded. For NPG, only portraits painted during the sitter’s lifetime were included. For WGA, portraits were geocoded by painter’s location at time of painting; mixed-effects models included random effects for country.
Societal covariates: GDP per capita (Maddison Project) and Polity2 democratization index (Polity IV) were compiled (UK GDP 1500–2000; Polity2 yearly from 1800). Missing values (except in time-lag analyses) were forward-filled with the closest previous value. Number of book titles per capita was used as an alternative affluence proxy.
Statistical analyses: Individual-level linear models (each painting as one data point) regressed perceived trustworthiness estimates on time (scaled so 1 unit=100 years), GDP per capita, and democratization, controlling for perceived dominance, gender, and age. For WGA, two-level mixed models clustered by country were used. Bayes factors used BIC approximation. Time-lag analyses aggregated data by decades; perceived trustworthiness at decade d was modeled with concurrent dominance, linear time trend, lagged trustworthiness and dominance (d-2), and lagged GDP or democratization (d-2). Reverse models tested whether lagged trustworthiness predicted future GDP or democratization. Robustness checks used a 1-decade lag.
Key Findings
- Historical increase: Perceived trustworthiness in portraits increased over time.
• NPG: b=0.14±0.02, z=7.49, p<0.001 (time coded per 100 years; adjusted for perceived dominance).
• WGA: b=0.07±0.01, z=5.33, p<0.001.
- Association with affluence: Higher GDP per capita was associated with higher perceived trustworthiness.
• NPG: b=0.03±0.01, z=7.13, p<0.001; controlling for time b=0.02±0.01, z=3.16, p=0.002; Bayes Factor (GDP vs time model)=3.38.
• WGA: b=0.09±0.03, z=3.16, p=0.002; controlling for time b=0.07±0.04, z=1.98, p=0.048; Bayes Factor=130.16 in favor of GDP model.
• Alternative proxy: Number of book titles per capita positively associated with perceived trustworthiness (NPG: b=0.35±0.06, z=6.15, p<0.001; controlling time b=0.21±0.06, z=3.45, p=0.001). WGA showed a positive association without time control (b=0.29±0.10, z=2.77, p=0.006) that was not robust when controlling for time (b=0.14±0.11, z=1.26, p=0.208).
- Democratization: Polity2 showed a positive association in NPG (b=0.03±0.01, z=5.24, p<0.001) that disappeared when controlling for time (b=-0.01±0.01, z=-0.50, p>0.250). In WGA, democratization was not positively associated (b=-0.01±0.01, z=-1.96, p=0.051; with time b=-0.01±0.01, z=-0.96, p>0.250). Models with GDP per capita outperformed democratization models (NPG BF=2.75; WGA BF=6.16).
- Time-lag analyses: Changes in GDP per capita predicted future increases in perceived trustworthiness approximately two decades later.
• NPG: GDP lag effect F(40,1)=12.38, p=0.001; institutions lag F(15,1)=0.11, p>0.250.
• WGA: GDP lag 20 years χ2(1)=6.42, p=0.011; institutions lag χ2(1)=0.81, p>0.250.
• No evidence that perceived trustworthiness predicted future GDP: NPG F(41,1)=0.76, p>0.250; WGA χ2(1)=2.02, p=0.155.
- External validity with contemporary behavior: In the SelfieCity dataset (N=2277 selfies across 6 cities, 2013), higher city-level cooperation and trust (from EVS/WVS) were associated with higher perceived trustworthiness in selfies (cooperation b=0.13±0.03, z=3.67, p<0.001; trust b=0.81±0.23, z=3.50, p<0.001).
- Algorithm validity: The model’s outputs aligned with human ratings across four face databases, replicated known effects (younger, feminine, and happy faces judged more trustworthy; robustness across head orientations), and generalized to diverse images (e.g., Google Images portraits; N=633; trust t(632)=7.89, p<0.001; dominance t(632)=-11.79, p<0.001).
Discussion
The findings indicate that displays associated with perceived trustworthiness in Western European portraiture increased from the 16th to the 20th century, consistent with qualitative accounts of cultural shifts (e.g., a 'Smile Revolution' and rising liberal values). The association between perceived trustworthiness and GDP per capita, including time-lag evidence that GDP changes preceded changes in perceived trustworthiness, suggests that improving living standards may foster environments where trust and prosocial displays are more prevalent. Democratization showed weaker and non-robust associations when controlling for time. The algorithm estimates human-like perceptions rather than actual traits and is sensitive to known perceptual biases; nevertheless, its convergence with human ratings and ability to reproduce established social cognition effects support its use for historical inference. These results highlight how cognitive-science methods can inform cultural evolution by quantifying psychological signals embedded in artifacts and linking them to macro-societal factors.
Conclusion
The study introduces a machine-learning approach to estimate perceived trustworthiness from facial cues in portraits and applies it to large historical datasets. Perceived trustworthiness in Western European portraits increased over several centuries and is more strongly associated with affluence (GDP per capita; number of book titles per capita) than with institutional democratization. Time-lag analyses suggest that economic growth precedes increases in perceived trustworthiness displays. The approach complements qualitative history by providing quantitative evidence and a novel proxy for social trust. Future work should further validate cue stability across time, improve alignment with human ratings (e.g., by incorporating facial texture and hairstyle cues), refine economic and inequality measures, and expand to broader and less elite samples to test generalizability.
Limitations
- Sample representativeness: Historical portraits largely depict elites; findings may not generalize to broader populations, and social attitudes can vary with socioeconomic status.
- Cue stability over time: The approach assumes that facial cues used to judge trustworthiness are stable across centuries; while recent evidence supports some stability, this requires further testing.
- Measurement limits: Historical GDP and living standard estimates are imperfect and may not capture inequality or all aspects of wealth; book titles per capita is an indirect proxy.
- Algorithm-human alignment: Correlations with human ratings are modest; training on texture-free avatars and blindness to hairstyle and other non-facial cues may limit performance. The algorithm focuses on shared components of first impressions, excluding idiosyncratic factors.
- Construct validity: The algorithm estimates perceived trustworthiness on images, not actual trustworthiness; first impressions are influenced by factors like lighting and pose. Participant ratings of historical portraits could be biased by stylistic and historical cues, complicating direct validation on such images.
Related Publications
Explore these studies to deepen your understanding of the subject.