Introduction
STEM education, integrating science, technology, engineering, and mathematics, aims to develop learners' professional knowledge and skills, fostering critical thinking and adaptability. MOOCs (Massive Open Online Courses) offer widespread access to STEM education, but suffer from high dropout rates, hindering learning sustainability and effectiveness. This study investigates the problem of dropout in STEM MOOCs, focusing on the analysis of massive learning behavior data generated throughout the entire learning period. The researchers hypothesize that a data-driven approach incorporating both explicit (e.g., demographics) and implicit (e.g., patterns of activity) features of learner behavior can accurately predict dropout and inform effective interventions. The significance of this study lies in its potential to improve the design and delivery of STEM MOOCs, increasing accessibility and completion rates for a wider range of learners. Existing research acknowledges the limitations of MOOCs in addressing individual learning needs and promoting effective teacher-student and student-student interaction. This study aims to fill this gap by developing a predictive model that can identify at-risk learners and suggest timely interventions.
Literature Review
Previous research highlights the advantages and disadvantages of MOOCs for various stakeholders. While MOOCs provide extensive resources and access, they often fail to cater to individual learning needs and preferences, leading to negative learning experiences and dropout. Studies have also pointed out the challenges of teacher-learner interaction in large-scale online settings, impacting learning effectiveness. Although some research exists on dropout prediction in MOOCs, much of it lacks comprehensive analysis of massive learning behavior instances, particularly in STEM, which presents unique challenges due to interdisciplinary nature and reliance on collaboration and practical applications. Existing methods often focus on single-course analysis or neglect the temporal aspects of learning behavior. This study builds upon previous work by employing a more comprehensive approach that considers both explicit and implicit features within a temporal sequence framework.
Methodology
The study utilizes a large-scale learning behavior dataset from the Open University UK, focusing on three STEM courses (DDD, EEE, FFF) across four learning periods (2013B, 2013J, 2014B, 2014J). The dataset includes learner demographics, learning accumulation (previous attempts, studied credits, highest education), assessment results, and various interactive learning activities (forum discussions, quizzes, resource access, etc.). Data standardization involved handling missing values and transforming categorical variables. The research addresses four key questions:
1. Does demographic information influence dropout trends?
2. Do topological paths between interactive learning activities influence dropout trends?
3. Does learning accumulation influence dropout trends?
4. Do assessment results influence dropout trends?
A novel dropout prediction model, STEM_DP, was developed. This model integrates convolutional neural networks (CNNs) for feature extraction and recurrent neural networks (RNNs) with long short-term memory (LSTM) for handling sequential data. The CNN extracts local features from the learning behavior data, while the LSTM tracks changes in these features over time. The model combines explicit and implicit features to predict dropout. The training process involves optimizing a loss function that considers both explicit and implicit feature losses. Gradient calculations incorporating the LSTM mechanism track changes in the temporal sequence. Evaluation metrics include precision, recall, F1-score, and AUC. The dataset was split into training and testing sets (80/20 split), and the model's performance was evaluated across different learning periods and courses.
Key Findings
STEM_DP demonstrated high prediction accuracy (all four evaluation metrics above 0.900), suggesting its effectiveness in predicting dropout. Analysis showed that younger learners and those from lower IMD (Index of Multiple Deprivation) bands had a higher likelihood of dropping out. While demographic information alone did not strongly correlate with dropout, specific variables like age and IMD band did. The study identified key topological paths of interactive learning activities associated with dropout and non-dropout learners. Early participation in forum discussions was crucial, followed by other activities like quizzes, resources, and collaborative tools. Learning accumulation (prior attempts, studied credits, highest education) also negatively correlated with dropout. Assessment results did not directly impact current course dropout but significantly influenced participation and success in future courses. Learners who had previously failed assessments were more likely to drop out of subsequent courses. A 20-day timeframe was found to be optimal for dropout prediction.
Discussion
The findings highlight the importance of considering both explicit and implicit features of learning behavior, along with temporal sequences, for accurate dropout prediction in STEM MOOCs. The high accuracy of STEM_DP demonstrates the feasibility of using a data-driven approach to identify at-risk learners. The identified factors – demographics, learning behavior patterns, accumulation, and assessment history – provide valuable insights into the causes of dropout and can inform targeted interventions. The 20-day prediction window suggests a need for early intervention strategies. The study contributes to the field by offering a sophisticated model for dropout prediction and highlighting the importance of understanding the temporal dynamics of learner behavior.
Conclusion
This study presented STEM_DP, a novel dropout prediction model for STEM MOOCs that achieves high accuracy. It highlights the importance of considering explicit and implicit behavioral features within a temporal framework. The findings offer valuable insights for designing effective interventions, emphasizing the need for early identification of at-risk students and targeted support based on individual learner characteristics and behavioral patterns. Future research could explore the model's performance with different datasets and educational contexts. Furthermore, investigating the effectiveness of various intervention strategies based on the model's predictions would be valuable.
Limitations
The study used data from a single MOOC platform, potentially limiting the generalizability of findings to other platforms. The dataset may not fully capture the complexity of all factors influencing dropout. While the model demonstrated high accuracy, the real-world impact of interventions based on its predictions would need further investigation. The study's focus on explicit features from the provided dataset might inadvertently overlook other potential relevant factors.
Related Publications
Explore these studies to deepen your understanding of the subject.