Introduction
The capacity for systematic compositionality—the ability to understand and generate novel combinations of known elements—is a defining characteristic of human language and thought. Fodor and Pylyshyn's seminal work (1988) argued that artificial neural networks fundamentally lack this capacity, posing a significant challenge to their viability as cognitive models. This challenge, known as the systematicity challenge, has fueled decades of debate. While counterarguments have highlighted the non-perfectly systematic nature of human composition and the potential for more sophisticated neural network architectures to exhibit greater systematicity, modern neural networks still struggle with rigorous tests of systematic generalization. This study directly addresses this challenge by investigating whether neural networks can be trained to achieve human-like systematic generalization.
Literature Review
The debate surrounding systematicity in neural networks has spanned over 35 years, with arguments focusing on two primary points. Firstly, some researchers have argued that human compositional abilities are less systematic and rule-like than initially proposed by Fodor and Pylyshyn (1988), highlighting the role of inductive biases and probabilistic reasoning. Secondly, advances in neural network architectures have shown potential for improved systematicity. However, despite significant advancements in neural networks, including those in natural language processing, they consistently struggle with systematic generalization tests. Various studies (Lake & Baroni, 2018; Ettinger et al., 2018; Bahdanau et al., 2019; Keysers et al., 2019; Yu & Ettinger, 2020; Kim & Linzen, 2020; Hupkes et al., 2020; Press et al., 2022) have highlighted these limitations, demonstrating the persistent challenge of achieving human-like compositional skills in artificial systems.
Methodology
This research introduces a novel meta-learning approach called Meta-Learning for Compositionality (MLC) to train neural networks for compositional skills. Unlike approaches that rely on added symbolic machinery or hand-designed internal representations, MLC guides the training process through a dynamic stream of compositional tasks (few-shot learning episodes). Each episode presents the network with a set of study examples (input/output pairs) and a query instruction, all provided as simultaneous input. The network must learn the meaning of words from these examples and generalize to produce outputs for novel, compositionally complex query instructions. The MLC implementation uses standard transformer networks, leveraging their inherent capabilities for sequence-to-sequence learning and self-attention mechanisms. The meta-learning aspect involves optimizing the network over dynamically changing episodes, each defined by a randomly generated latent grammar, rather than a fixed dataset.
Human participants were evaluated alongside MLC using instruction-learning tasks in a pseudolanguage. Two types of tasks were employed: a few-shot learning paradigm involving a curriculum of study instructions followed by query instructions, and an open-ended task where participants generated outputs for unknown instructions without prior examples. The responses were analyzed to quantify systematic generalization and identify prevalent inductive biases (one-to-one mapping, iconic concatenation, mutual exclusivity). Seven different models were compared, including probabilistic symbolic models with and without inductive biases, basic sequence-to-sequence (seq2seq) models, and variants of MLC. Model performance was assessed based on their ability to predict human responses, including errors, across both few-shot and open-ended tasks. The log-likelihood of human behavior given model predictions was used for comparison. Additionally, MLC was evaluated on standard machine learning benchmarks (SCAN and COGS) known to challenge compositional generalization.
Key Findings
The results demonstrate that MLC achieves human-level or even surpasses human systematic generalization across multiple tasks. In the few-shot instruction-learning task, MLC achieved a high level of accuracy in matching algebraic responses (82.4% vs. 80.7% for humans), including generalization to longer output sequences than those seen during training (77.8% vs. 72.5% for humans). Notably, MLC also replicated human error patterns, such as one-to-one translations and iconic concatenations, indicating a nuanced understanding of human compositional behavior. The open-ended instruction task further confirmed MLC's capacity to capture human inductive biases. MLC showed high consistency with the modal human response (65.0%), reflecting one-to-one mappings, iconic concatenations, and mutual exclusivity, behaviors commonly observed in human language learning. Model comparison using log-likelihood analysis revealed MLC's superior performance in predicting both few-shot learning and open-ended human responses compared to various baseline and alternative models, including probabilistic symbolic models and basic seq2seq transformers. On standard machine-learning benchmarks (SCAN and COGS), MLC achieved less than 1% error rates in systematic lexical generalization tasks, significantly outperforming basic seq2seq models. However, MLC's success is limited to in-distribution generalization; out-of-distribution tasks, requiring higher levels of productivity in generating novel structures, presented significant challenges.
Discussion
This study provides compelling evidence that standard neural network architectures, when optimized using the proposed MLC method, can achieve human-like systematic generalization. The findings address the long-standing systematicity challenge, demonstrating that neural networks are not inherently incapable of compositional abilities but rather require appropriate training procedures. MLC's ability to mimic not only accurate but also error-prone human behavior underscores its potential as a superior modeling tool for capturing the complexities of human cognition. The comparison with various baseline models, including probabilistic symbolic models, reinforces MLC's ability to balance systematicity and flexibility, mimicking the nuanced interplay between rule-based processing and inductive biases observed in human language processing. The high accuracy achieved on benchmark datasets further validates MLC's effectiveness in promoting compositional skills in machine learning systems.
Conclusion
This research demonstrates that meta-learning, specifically MLC, can effectively train standard neural networks to achieve human-like systematic generalization. MLC's success in mimicking both accurate and error-prone human behavior, along with its superior performance on benchmark datasets, makes it a valuable tool for understanding and modeling human cognitive abilities and for improving the compositional skills of machine learning systems. Future work should explore extending MLC to handle out-of-distribution generalizations, incorporating mechanisms for generating new symbols, and applying it to more complex natural language tasks and other modalities.
Limitations
While MLC shows significant advancements, it does not fully resolve all aspects of the systematicity challenge. The current method struggles with out-of-distribution generalizations and concepts outside the meta-learning distribution, limiting its capacity to process entirely novel structures. Additionally, MLC's capacity to generalize to nuances in inductive biases that it wasn't explicitly trained on is limited. Further research is needed to enhance MLC's ability to handle productivity and generalization to truly novel situations, potentially by incorporating additional mechanisms or modifying the meta-training procedure.
Related Publications
Explore these studies to deepen your understanding of the subject.