logo
ResearchBunny Logo
Modelling dataset bias in machine-learned theories of economic decision-making

Economics

Modelling dataset bias in machine-learned theories of economic decision-making

T. Thomas, D. Straub, et al.

This exciting research by Tobias Thomas, Dominik Straub, Fabian Tatai, Megan Shene, Tümer Tosik, Kristian Kersting, and Constantin A. Rothkopf delves into dataset bias in economic decision-making theories. They reveal intriguing findings about how online data may introduce greater decision noise than laboratory studies, leading to enhanced predictions through a new probabilistic generative model.

00:00
00:00
Playback language: English
Introduction
Predicting and explaining human risky choices has been a central challenge across various disciplines, including psychology, economics, and neuroscience. Normative models focus on how people *should* decide, while descriptive models aim to capture how people *actually* decide. The latter often deviate from normative predictions. Recent advances in machine learning, particularly neural networks (NNs), have offered the potential to build more accurate descriptive models of decision-making, even automating theory development by training NNs on large datasets of choices. However, this approach presents challenges. The relationship between theory, models, and data is intricate. Data collection itself is theory-laden, and the properties of datasets can interact significantly with the models used to analyze them. Three key issues are obtaining representative data, selecting models that balance interpretability and expressiveness, and understanding dataset-model interactions. For instance, a complex model might overfit a small dataset, while a simple model might unexpectedly perform well on a large, complex dataset. Previous studies have addressed some of these issues by collecting large datasets (e.g., choices13k) and using cross-validation, but the interplay of datasets and models warrants further investigation.
Literature Review
The authors review existing literature on normative and descriptive models of decision-making under risk, highlighting the limitations of classic economic theory in accurately predicting human behavior. They discuss the rise of machine learning and NNs in modeling human choices, referencing studies that utilized large-scale datasets and complex NN architectures to achieve improved predictive accuracy. The limitations of relying solely on data-driven approaches without theoretical grounding are also emphasized. The paper highlights the importance of carefully considering the interaction between theory, models, and data, noting that theory should inform data collection and analysis, and that model selection must consider both interpretability and accuracy. The use of cross-validation and various NN architectures of increasing complexity are noted as significant methodological advances.
Methodology
The study systematically trained several machine-learning models (BEAST, random forest, SVM, two NN architectures from Bourgin et al. and Peterson et al.) on three datasets: CPC15 (laboratory), CPC18 (laboratory), and choices13k (online via Amazon Mechanical Turk). Transfer testing was used to assess model generalization across datasets. To investigate the source of dataset bias, feature importance weights (a technique from explainable AI or XAI) were used to identify gamble features predictive of differences in predictions between models trained on CPC15 and choices13k. Linear regressions were performed to assess the predictive power of different feature sets (basic gamble features, naive features, psychological features). A hybrid probabilistic generative model was developed to model the effect of decision noise on choices, combining a NN trained on CPC15 with a generative Bayesian network accounting for a proportion of subjects guessing randomly and the remainder making choices with added decision noise in log-odds space. The model's parameters (noise level and proportion of guessing subjects) were inferred using probabilistic programming. The study also utilized SHAP (SHapley Additive exPlanations) to analyze feature importance in the context of differences in NN predictions.
Key Findings
The analysis revealed clear signatures of dataset bias. Models trained on choices13k (a large online dataset) performed poorly when applied to smaller laboratory datasets (CPC15 and CPC18), and vice versa. Feature importance analysis indicated that psychological features, particularly those relating to the degree to which one gamble is expected to yield a higher payoff than the alternative (e.g., stochastic dominance, difference in expected value, probability of a higher outcome), were highly predictive of the discrepancies between models trained on different datasets. Crucially, the choices13k data showed less extreme choice proportions, suggesting increased decision noise. The hybrid model, incorporating structured decision noise and a proportion of random guessing, significantly improved performance on the choices13k dataset, accounting for more than half of the discrepancy between datasets. SHAP analysis provided local explanations for the NN's decisions; however, these were not directly interpretable in terms of the psychological features and naive features that captured prediction differences more clearly. The authors argue that the size of the dataset alone is not sufficient to guarantee the generalizability of machine-learned theories of decision-making; the context of data collection, including potential for increased noise in online studies, should be taken into account.
Discussion
The findings challenge the notion that simply training NNs on large datasets automatically leads to the discovery of generalizable theories of human decision-making. The results suggest that the superior performance of a previously reported NN model on the choices13k dataset may be partially attributable to the incorporation of structured decision noise inherent in the online data. The study highlights the complex interplay between dataset properties (e.g., online vs. lab setting, potential for decision noise), model complexity, and the resulting predictive accuracy and generalizability. The authors stress the importance of integrating theory-driven reasoning with data-driven methods in the development and evaluation of models of human decision-making. The limited interpretability of the SHAP values, despite its popularity as an XAI method, underscores the ongoing challenges of understanding the decision processes of complex models.
Conclusion
While large-scale datasets and complex NNs offer exciting possibilities for studying human decision-making, this study emphasizes that data collection context significantly influences results. The superior performance of models trained on large online datasets does not necessarily imply a superior understanding of general decision-making principles. Future research should focus on richer datasets containing individual-level decisions and experimental contexts, examining the impact of different data collection methods on model generalizability. The integration of theory-driven reasoning and data-driven modeling remains crucial for advancing our understanding of human decision-making.
Limitations
The study focuses on a limited subset of risky economic choices (binary gambles with small monetary stakes). The generalizability of findings to choices involving more complex gambles, larger monetary amounts, or real-world scenarios needs further investigation. The interpretation of SHAP values remains challenging, highlighting the difficulty of extracting easily interpretable explanations from complex black-box models. While a hybrid model incorporating decision noise improved predictions, other sources of dataset bias or differences in decision processes may also play a role.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny