Introduction
The study of emotion in psychology has long relied on common-sense categories inherited from folk psychology. However, these categories may not cleanly map onto physical measurements of the brain, body, and behavior. This paper re-examines the hypothesis that a natural description of the human mind may require moving beyond the assumption that folk categories represent ground truth. The researchers use machine learning, specifically comparing supervised and unsupervised approaches, to analyze data across three experimental settings: fMRI brain activity, ambulatory peripheral physiological signals, and self-reported emotional experiences. Unsupervised machine learning, unlike supervised learning, doesn't rely on pre-assigned labels, offering a potentially less biased approach to uncovering underlying structure in the data. By comparing the results of both approaches, the study aims to assess the extent to which psychological categories can be considered objective ground truth. The focus is on demonstrating the importance of critically examining the labels used, rather than explaining discrepancies between methods or supporting a specific hypothesis. The study uses emotion categories as a test case due to their complexity and the ongoing debate about their underlying structure. The datasets selected represent common research practices in emotion science, encompassing a range of induction techniques and measurement modalities.
Literature Review
The paper reviews past studies that have applied supervised machine learning techniques to identify biomarkers for pre-defined mental categories. It highlights inconsistencies in the reported patterns associated with specific emotion categories across different studies, even when using similar methods and stimuli. These inconsistencies could be attributed to methodological factors such as small sample sizes, varying affect induction methods, preprocessing workflows, and classification algorithms. However, the authors argue that the persistent variation observed across decades of research suggests that a purely methodological explanation may be insufficient. The substantial within-category variability and cross-category similarity observed in neural activity, autonomic nervous system (ANS) activity, and behavior further supports the notion that imposing single labels on data may be overly simplistic and could obscure the discovery of more meaningful categories. The authors cite examples of physiological responses varying across instances of the same emotion category due to differences in situations and resulting actions (e.g., running, freezing, or attacking in response to fear). Similar variations are found in facial expressions and neural correlates, emphasizing the complexity of the relationship between presumed emotion categories and physiological responses.
Methodology
The study analyzes three datasets: (1) fMRI data from a study where participants immersed themselves in auditory scenarios designed to evoke happiness, sadness, or fear; (2) ambulatory peripheral physiological signals (ECG, ICG) and self-reported emotions from participants throughout their daily lives; and (3) self-report ratings of emotional experiences in response to movie clips.
For the fMRI data, a 3D Convolutional Neural Network (CNN) with sixfold cross-validation was used for supervised classification, and a Gaussian Mixture Model (GMM) with Bayesian Information Criterion (BIC) for model selection was used for unsupervised clustering. The GMM's sensitivity was validated using synthetic data.
For the ANS data, a fully connected neural network with fivefold cross-validation was used for supervised classification, and Dirichlet Process Gaussian Mixture Modeling (DP-GMM) was used for unsupervised clustering. A permutation test was used to assess statistical significance due to varying numbers of events per label.
For the self-report data, Latent Dirichlet Allocation (LDA) was used for unsupervised clustering of categorical emotion ratings, and a neural network with eightfold cross-validation was used for supervised classification of 14-dimensional affective features. The BIC was used for model selection in the GMM. In both the supervised and unsupervised analyses of the self-report data, only categories with sufficient samples (at least 2.9% of videos) were included. Oversampling was used to address class imbalances in the supervised analysis.
Key Findings
Across all three datasets, the study found inconsistencies between supervised and unsupervised methods. Supervised analyses achieved above-chance classification accuracy in all three cases, indicating the presence of information related to emotion category labels. However, unsupervised analyses revealed clusters that did not consistently correspond with the emotion category labels.
In the fMRI analysis, the GMM identified a variable number of clusters across participants, with no clear correspondence between the clusters and the experimenter-provided labels.
In the ANS analysis, DP-GMM revealed a variable number of clusters for each participant, with a many-to-many correspondence between participant-generated emotion labels and the discovered clusters.
In the self-report data, LDA did not reveal clear clusters in categorical emotion ratings, while GMM analysis of affective features found three clusters that did not neatly align with the assigned emotion categories. The supervised analysis using the self-report data, however, revealed statistically significant above chance performance.
Discussion
The findings suggest two possibilities: (1) emotion category labels represent genuine biological categories but measurement limitations obscure the underlying structure; or (2) folk emotion categories are not adequate for capturing the complexity and variability of emotional experience. The inconsistencies between supervised and unsupervised results highlight the importance of considering the validity of emotion category labels and potential limitations in experimental designs. The authors argue that the variability observed in past studies may not simply reflect error, but rather highlights the need for more nuanced approaches to studying emotion. The use of consistent stimuli across studies and the inclusion of a wider range of emotional instances could improve the reliability of results. The authors also propose that future studies consider combinations of emotion words to better capture complex emotional experiences.
Conclusion
This study underscores the importance of critically evaluating the assumptions and validity of using folk psychology categories in psychological research. The observed discrepancies between supervised and unsupervised approaches suggest that emotion categories may not be simple, stable biological kinds. Future research should focus on more comprehensive data collection, including internal and external context, to better understand the structure of emotional experience. The authors recommend routinely comparing supervised and unsupervised approaches, employing multiple methods, and increasing the power of studies to enhance the reliability and reproducibility of findings. This approach is relevant to the study of psychological categories beyond the domain of emotion.
Limitations
The study's findings are based on three specific datasets, limiting the generalizability of the results. The different data types and methodologies used across datasets may have influenced the results. The validation procedure using synthetic fMRI data was only done for fMRI dataset. Furthermore, the specific machine learning methods used could have influenced the outcomes, though several methods were compared across datasets and analysis. The interpretation of above-chance classification accuracy in supervised analysis requires consideration of potential alternative explanations beyond the validity of emotion categories.
Related Publications
Explore these studies to deepen your understanding of the subject.