The study challenges the long-held 'gesture-first' hypothesis of language origin and the notion of arbitrary sound-meaning relationships. A growing body of research reveals systematic sound symbolism across languages. Examples include the consistent association of vowels like [i] with smaller objects and [a] with larger objects, and the kiki/bouba effect linking sounds to shapes. The study focuses on 'Pokémonastics', the analysis of sound symbolic patterns in Pokémon names, observing that sound symbolic effects are stronger for in-game survival traits (like combat attributes) than for less impactful ones (like gender). Previous research using random forest (RF) algorithms to classify Pokémon based on name sounds showed an overprediction of 'post-evolution' (stronger) Pokémon, suggesting a bias towards threat. Similar biases were observed in RF models classifying human names by gender. This study re-examines this bias through the lens of error management theory (EMT), which posits that evolutionary selection favors minimizing costly errors. Applying EMT to the sound symbolism of threat, false positive (FP) errors (classifying non-threat as threat) might be less costly than false negatives (missing a threat).
Literature Review
The literature review highlights the debate between arbitrary and non-arbitrary sound-meaning relationships, tracing it back to Plato's Cratylus. It discusses the stochastic nature of sound symbolism, with statistical analysis revealing cross-linguistic patterns. Studies on nonce word stimuli demonstrate consistent associations of sounds with size (frequency code hypothesis), while others focus on shape (kiki/bouba effect). The study draws heavily on previous Pokémonastic studies demonstrating cross-linguistic sound symbolic patterns, particularly in traits relevant to in-game survival. Prior research using random forest algorithms showed that these algorithms tend to overpredict threatening categories. This overprediction pattern serves as the foundation for the current hypothesis and methodology.
Methodology
This study uses XGBoost models, instead of RF models used in previous studies, to determine if the error distribution skew is algorithm-specific or a more general phenomenon. The dataset comprises Japanese, Korean, and Chinese Pokémon names, converted into counts of speech features. The high number of null values in the feature set prompted the use of 3-fold cross-validation to mitigate overfitting. Four dependent variables—Attack, Defend, Height, and Weight—were tested, categorized by median split. Height and Weight were chosen as control variables because unlike Attack and Defend, they do not impact in-game combat scenarios. The hypotheses tested were: H1: FP error will be greater than 50% in all models; H2: Combat parameter models will have higher FP error than size-specific models. Linear regression models were used to analyze the relationship between continuous variables (including omitted samples) and name length, investigating the 'longer-is-stronger' principle (stronger Pokémon have longer names).
Key Findings
All machine learning algorithms achieved accuracy above 50%. However, only the Attack model (and the combined combat model) showed a significantly higher FP error rate than chance (H1 not fully supported). Combat models consistently exhibited higher FP error rates than size models (H2 supported). Linear regression analyses revealed significant positive correlations between name length and combat parameters (Attack and Defend), but not size parameters (Height and Weight), in most languages except Chinese. Table 4 presents aggregated results: Combat models showed average FP rates around 55%, while size models averaged around 48%. The difference in FP error rates between combat and size models was statistically significant (t(34)=25.55, p < 0.001).
Discussion
The results partially support the hypothesis that algorithms classifying threat exhibit a skew towards FP errors. The higher FP error rates in combat models align with the EMT prediction: ambiguous signals are more likely classified as threats, minimizing the cost of false negatives. The lack of significant FP skew in size models suggests that this effect is tied to perceived threat, not merely to category size imbalance. The 'longer-is-stronger' principle, although showing a correlation between name length and strength, is unlikely to be the sole explanation for the observed FP bias. The consistent pattern across different algorithms and languages strengthens the study's findings, suggesting an inherent tendency in processing sound symbolic information.
Conclusion
This study demonstrates that XGBoost models, classifying Pokémon names by combat parameters (Attack, Defend), show a bias towards FP errors, suggesting an evolutionarily advantageous preference for cautious behavior in interpreting sound symbolism related to threat. Further research is needed to confirm this effect in natural languages and explore its implications for our understanding of language evolution, potentially challenging the gesture-first hypothesis. More research involving a broader range of languages and a larger corpus of data is warranted.
Limitations
The study uses a limited corpus of video game character names, not natural language data. The analysis focuses primarily on East Asian languages. The inner workings of the XGBoost algorithm are not fully transparent, making it challenging to definitively explain the observed FP bias. Furthermore, anthropomorphizing AI algorithms by assuming the same effect in humans requires further validation.
Related Publications
Explore these studies to deepen your understanding of the subject.