logo
ResearchBunny Logo
FEW questions, many answers: using machine learning to assess how students connect food-energy-water (FEW) concepts

Education

FEW questions, many answers: using machine learning to assess how students connect food-energy-water (FEW) concepts

E. A. Royse, A. D. Manzanares, et al.

Unlock the potential of machine learning in education! This innovative research examined how machine learning can assess students' understanding of the complex Food-Energy-Water Nexus, revealing impressive accuracy in identifying key concepts in their responses. Conducted by a diverse group of scholars, the findings highlight the strengths of students' knowledge about water usage but also unveil challenges in grasping trade-offs.

00:00
00:00
Playback language: English
Introduction
Addressing complex, interconnected global challenges requires interdisciplinary education that fosters systems thinking. Assessing student understanding in these programs is difficult, particularly when evaluating complex synthesis concepts like the Food-Energy-Water (FEW) Nexus. Traditional assessments, such as constructed responses (CR), are time-consuming to grade. This study explores the use of machine learning (ML) text classification as a tool to efficiently and effectively assess student understanding of FEW Nexus concepts and their ability to apply systems thinking. The research questions are: (1) Can ML models identify instructor-determined key concepts in student responses? (2) What do college students understand about FEW interconnections and systems thinking within their responses? The FEW Nexus, as a coupled systems approach, provides a practical framework for exploring systems thinking in environmental education. Current assessment tools are often inadequate for evaluating this higher-order thinking. Machine learning offers a potential solution by automating the scoring of open-ended responses, allowing for large-scale analysis of student work and providing insights into their understanding of complex relationships.
Literature Review
The introduction cites the Next Generation Science Standards (NGSS) as a framework for linking scientific disciplines, emphasizing cross-cutting concepts. It highlights the challenges of assessing interdisciplinary learning, particularly systems thinking, and mentions frameworks like the SDGs, Resilience Thinking, and the UN-PRME. The paper discusses the use of constructed responses (CR) for assessing interdisciplinary connections, acknowledging the difficulties in designing and grading these assessments. The limitations of existing concept inventories (CIs) in capturing interdisciplinary concepts and higher-level thinking are noted. The use of machine learning (ML) and natural language processing (NLP) in education is reviewed, focusing on their application in automated scoring of student responses and the development of predictive scoring models. The FEW Nexus is presented as an ideal concept for exploring complex systems content, given its global significance and relevance to students' daily lives. The paper acknowledges a lack of assessment tools targeting advanced systems-level relationships within the FEW Nexus and emphasizes the need for new assessments that go beyond simple memorization of facts.
Methodology
The study employs a modified question development cycle, integrating question design, rubric development, human coding, and ML model development. Two assessment items were created: one focusing on identifying sources and explaining connections within FEW systems (Reservoir item), and another on evaluating outcomes and comparing trade-offs (Biomass item). Data were collected from introductory IES courses at ten diverse institutions across the US. Analytic rubrics were developed and iteratively refined through multiple rounds of human coding, aiming for at least 85% inter-rater agreement. Supervised ML text classification was used. An ensemble of eight ML algorithms was employed for each question item. Model performance was evaluated using metrics such as accuracy, Cohen's kappa, sensitivity, specificity, and F1 score. Extended model tuning strategies were used to address data imbalances and improve model performance. These strategies included additional feature engineering (synonym substitution, longer n-grams), data rebalancing, creating dummy responses, and merging rubric bins. The study analyzed co-occurrence of codes to assess student understanding of FEW interconnections and systems thinking. For the Reservoir item, co-occurrence was used to gauge the level of understanding; for the Biomass item, expertise levels (1-4) were defined based on specific combinations of codes.
Key Findings
A total of 26 text classification models were developed (11 for the Reservoir item and 15 for the Biomass item). Model accuracy ranged from 0.755 to 0.992, with Cohen's kappa values reflecting varying degrees of agreement between human and machine scores. The Reservoir item models generally performed better than the Biomass item models. Several model tuning strategies were necessary to improve model performance for many of the Biomass item rubric bins. For the Reservoir item, co-occurrence analysis revealed frequent connections between hydropower generation and energy use in agriculture and infrastructure, but fewer connections between water use beyond hydropower and energy production. For the Biomass item, a higher proportion of students could explain changes in water usage than discuss trade-offs. Expert-level responses were infrequent for both items. The analysis of co-occurring codes provided insights into how students connect FEW concepts, ranging from novice-level understanding of individual components to more expert-level comprehension of trade-offs and interrelationships. Merging rubric bins improved model performance in several cases.
Discussion
The findings demonstrate the potential of ML text classification for assessing complex systems thinking in the context of the FEW Nexus. The study highlights the iterative nature of ML model development and the importance of careful rubric design and refinement. The challenges encountered, particularly with the Biomass item, underscore the complexity of assessing higher-order thinking skills, particularly when dealing with trade-offs and multiple interconnections. The high variation in student responses, reflecting diverse literacy abilities and understanding of FEW concepts, poses a challenge. The use of co-occurrence analysis provided insights into student understanding and demonstrated the value of considering multiple dimensions when evaluating system thinking. The relative success of the Reservoir versus Biomass models further suggests that the complexity of the task impacts ML model performance. Although the models showed varying degrees of success, the overall distribution of student responses across expertise levels approximated human-assigned levels.
Conclusion
This research provides initial steps in developing ML-based assessment tools for evaluating complex systems thinking in the context of interdisciplinary environmental education. The study highlights the potential benefits of ML for automating assessment and providing nuanced insights into student understanding. Future research should focus on developing more robust rubrics, refining model tuning strategies, and exploring the use of generative AI for enhancing assessment design and analysis. Furthermore, it is crucial to consider human effort involved in developing these models and disseminate the final products to the broader community for greater impact and efficiency.
Limitations
The study's limitations include the sample size of institutions, the potential for bias in human coding, and the fact that some models did not achieve perfect agreement with human scoring. The complexity of the assessment targets and the diverse nature of student responses also pose challenges for ML model development. The study focused on a specific set of assessment items and may not generalize to other contexts or types of assessments. The analysis of co-occurrence of codes as a proxy for systems thinking may need further refinement. The study didn't explore demographic differences in student responses.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny