logo
ResearchBunny Logo
Introduction
Recent advancements in deep learning have yielded artificial neural networks (ANNs) that achieve human-level or even superior performance in various domains, including language processing. This progress facilitates comparisons between ANNs and human cognitive processes. Artificial Grammar Learning (AGL) provides an ideal framework for such comparisons, as it has well-established roots in both cognitive science and computer science. Formal Language Theory (FLT), particularly Chomsky's hierarchy, offers a theoretical framework for understanding grammar complexity, while existing human AGL studies provide a benchmark for comparing ANN performance. The research question centers on determining which ANN architecture, feedforward or recurrent, better captures human behavior in AGL, considering the established link between these architectures and unconscious/conscious processes in visual perception. The study aims to use comparable training amounts for both humans and ANNs, leveraging fully-connected layers and backpropagation (through-time for recurrent networks). Furthermore, given AGL's established role in contrasting implicit and explicit learning, the study tests the hypothesis that these learning modes correspond respectively to feedforward and recurrent architectures.
Literature Review
Several theories attempt to explain human performance in AGL. Reber's initial hypothesis proposed implicit rule learning, but subsequent studies highlighted potential confounds, leading to revised accounts such as the microrule and chunking hypotheses. These suggest that learners focus on subsets of rules (e.g., bigrams, trigrams) or frequent chunks. Another perspective views human learning as a recurrent process, decoding relevant features from previous items in the sequence. However, most computational models have been trained on large datasets, often with training-test overlap, hindering direct comparison with human cognition, which typically involves exposure to only a few dozen examples. The distinction between implicit and explicit learning is another critical area in AGL research. While AGL is often considered implicit (automatic, non-intentional), some studies demonstrate implicit learning within intentional learning contexts. The existence of distinct implicit and explicit systems in the brain is proposed, with implicit processes possibly handling larger amounts of information at the cost of reduced flexibility, and explicit processes focusing on hypothesis testing, constrained by working memory limitations. Previous work indicates that complex grammars are more likely processed implicitly, whereas simpler ones lend themselves to explicit learning.
Methodology
The study involved four experiments using four artificial grammars spanning three levels of Chomsky's hierarchy: two regular grammars (A and B), one context-free, and one context-sensitive. Regular grammars had sequences ranging from 2-12 items, with vocabularies of 4 and 5 letters respectively. Context-free and context-sensitive grammars utilized a symmetry-based approach, with vocabularies of 10 letters and sequence lengths of 4, 6, or 8. Human participants (N=56, 31 female, age 25.4 ± 4.7) participated in four experiments, each using a different grammar (n=15 per grammar except context-sensitive, n=11). The experimental design for each grammar was identical. Each trial began with a fixation cross, followed by a letter string. Participants classified strings as correct or incorrect using key presses, receiving visual feedback (green for correct, red for incorrect). Accuracy and reaction times were recorded. Each participant underwent a 1-hour session with 10 blocks: 8 implicit blocks (480 trials total), followed by a questionnaire, a 20-trial explicit block (with access to the grammar rules), and a 20-trial memory block (without access to rules). The questionnaires assessed explicit knowledge, with different questions for each grammar type. For ANNs, feedforward and recurrent architectures with fully-connected layers were used. A parameter search was conducted to find networks with performance closest to human behavior for each grammar. Different parameter spaces (1400, 7900, 31000, and 122000 parameters) were explored, varying the number of layers and the learning rate. Networks were trained on 500 sequences (similar to human training), validated on 100 sequences, and tested on 200. The network closest to human performance was selected for each grammar. Learning curves were generated by varying the training set size from 100 to 500 sequences. Feedforward networks used ReLU activation functions (except for the sigmoid output neuron), binary cross-entropy loss, and stochastic gradient descent with Nesterov momentum. Recurrent networks used fully recurrent layers, a sigmoid output neuron, binary cross-entropy loss, and RMSprop optimization. One epoch (500 trials) and a batch size of 15 were used for both architectures. Data analysis for human experiments used Bayesian ANOVA to test for learning effects (accuracy and reaction times), considering blocks and subjects. Bayesian ANOVA was also used to compare explicit and memory blocks with implicit blocks. Questionnaire results (sensitivity, specificity, confidence) were analyzed using Bayesian methods. For ANNs, the distance between human and network performance was calculated for each grammar and parameter space. Learning curves were compared using Bayesian ANOVA.
Key Findings
Human participants learned the grammars above chance in all four experiments (Bayes Factors >> 100 for the BLOCK factor). Reaction times remained consistent throughout the experiments (no significant effects). Bayesian ANOVA showed a significant difference between implicit, explicit, and memory blocks (all BF >> 10), with the accuracy in memory blocks similar to explicit blocks (except for grammar B, BF = 7272), suggesting successful rule learning. Questionnaire analysis revealed differences in explicit knowledge, with better performance in grammar A compared to grammar B (Sensitivity BF > 100). For context-free and context-sensitive grammars, participants correctly identifying the grammar rules successfully detected wrong letters in novel sequences. ANN parameter search showed that recurrent networks performed closer to human behavior than feedforward networks. In the 31000-parameter space, feedforward networks performed best with fewer layers, while recurrent networks performed best with lower learning rates. Averaging across all grammars confirmed that feedforward networks achieved optimal results with a lower number of layers, and recurrent networks performed better with a lower learning rate. Comparing learning curves with a Bayesian ANOVA, considering trials, agents (humans, feedforward, recurrent), and grammars, revealed a significant effect for each factor and a very strong interaction between agent and grammar (BF = 2.3e+14). Post-hoc analysis showed that, except for grammar B, recurrent architectures more closely matched human performance. Analysis by sequence length showed an effect of length for all grammars, with shorter sequences yielding better performance. Again, recurrent networks generally performed more similarly to human performance than feedforward ones. Finally, an analysis of 10 regular grammars demonstrated that recurrent networks consistently outperformed feedforward networks, with the difference decreasing as grammar complexity increased, implying a correlation between grammar simplicity and the advantage of recurrent networks.
Discussion
The study's findings indicate that recurrent neural networks better model human AGL than feedforward networks, regardless of grammar complexity in Chomsky's hierarchy. This supports the importance of recursion in cognitive processes such as language. The results suggest that feedforward networks may capture implicit learning (in complex grammars), while recurrent networks better capture explicit learning (in simpler grammars). The study also highlights the importance of using comparable training data between humans and ANNs for valid comparisons. Previous studies often used significantly larger training datasets, making direct comparisons with human performance challenging. The study's methodology allows a direct comparison of learning dynamics, showing recurrent networks' learning curves better match human learning curves than those of feedforward networks. The observed similarity between recurrent network behavior and human behavior may reflect the underlying neuronal dynamics in the brain; however, neuroimaging studies are needed to further investigate this aspect. The impact of the learning rate in recurrent networks is another point of interest, consistent with previous research on learning in amnesic and healthy subjects. The results provide strong evidence for the hypothesis that explicit knowledge is better modeled by recurrent architectures.
Conclusion
This study demonstrates that recurrent neural networks more closely mimic human artificial grammar learning than feedforward networks. This suggests a crucial role for recursion in human language processing. The finding that recurrent networks are superior for simpler, more explicitly learned grammars supports this conclusion. Future research should investigate more ecologically valid grammars to further explore the relationship between neural network architectures and natural language acquisition.
Limitations
The study used artificial grammars, which may not fully capture the complexity of natural language. The number of grammars tested, while substantial, may not be exhaustive enough to cover the spectrum of linguistic complexity. Future studies could benefit from exploring a wider range of grammars and examining different variations within each grammar type. Additionally, the specific implementations of feedforward and recurrent networks used here may influence the results. Future work might examine the effects of different network architectures and hyperparameters. The limited training dataset, while designed to mimic human learning conditions, may also constrain the networks' ability to learn more complex patterns, and the inclusion of LSTM models with more training data would add valuable information.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny