Education

A systematic review of AI literacy scales

T. Lintner

Dive into the findings of a systematic review that analyzes the quality of AI literacy scales using the COSMIN tool, conducted by Tomáš Lintner. This study reveals crucial insights for researchers looking to choose the right instruments, emphasizing the strengths and weaknesses of various scales assessed across diverse populations.

00:00

Playback language: English

Index

Introduction

The increasing integration of artificial intelligence (AI) across society—in medicine, education, science, and various industries—has created a demand for AI literacy. AI's impact extends beyond the job market, influencing information processing and contributing to the rise of deepfakes and misinformation. AI literacy, a relatively new concept, is often considered an advanced form of digital literacy, encompassing understanding, interaction with, and critical evaluation of AI systems and outputs. This review addresses the need for quality AI literacy assessment instruments, crucial for understanding and promoting AI literacy development. The objectives are to provide a comprehensive overview of available AI literacy scales, critically assess their quality, and offer guidance on scale selection based on quality and context.

Literature Review

The review focuses on existing AI literacy scales and their validation studies. It explores different conceptualizations of AI literacy, highlighting common themes like technical understanding, societal impact, and ethical considerations. The review notes that while there's agreement on core competencies, differences exist regarding higher-order skills such as AI creation and evaluation. The existing literature emphasizes the need for AI literacy to be integrated into education at various levels, but there's a lack of attention paid to the development and understanding of suitable assessment instruments.

Methodology

This systematic review followed PRISMA 2020 guidelines and was preregistered. Searches were conducted in Scopus and arXiv databases, using specific keywords related to AI literacy and assessment scales. Inclusion criteria required studies to develop or revalidate AI literacy scales, provide full item lists, describe item formulation, describe participants, and detail validation techniques. Data extraction included author names, publication dates, scale type, item characteristics, target population, validation methods, and factor structure. Methodological quality was assessed using the COSMIN tool, considering content validity, structural validity, internal consistency, cross-cultural validity, reliability, measurement error, construct validity, responsiveness, interpretability, and feasibility. The GRADE approach was used for synthesizing evidence and providing a final quality rating. For scales with multiple validation studies, a random-effect meta-analysis was performed.

Key Findings

The initial search yielded over 5500 results, which were narrowed down to 22 studies validating 16 AI literacy scales. Most scales were self-report measures using Likert items, with only a few being performance-based. Scales targeted diverse populations, including the general public, higher education students, secondary education students, and teachers. While most scales demonstrated good structural validity and internal consistency, evidence for content validity, reliability, construct validity, and responsiveness was limited for many. No scales had been tested for cross-cultural validity or measurement error. Interpretability indicators were often missing, and raw data were rarely available. The review analyzed several prominent scales, including the AI literacy test, AI-CI, AILQ, AILS, AISES, Chan & Zhou's EVT-based instrument, ChatGPT literacy scale, GSE-6AI, Hwang et al.'s instrument, Intelligent TPACK, Kim & Lee's instrument, MAILS, MAIRS-MC, Pinski & Belian's instrument, SAIL4ALL, and SNAIL. Each scale's characteristics, psychometric properties, and quality assessment based on COSMIN criteria were detailed. Table 1 summarized the scales' characteristics and Tables 2, 3, and 4 presented quality assessment, interpretability indicators and feasibility indicators respectively. Many scales lacked sufficient evidence for certain measurement properties, raising concerns about their reliability and validity. Specific findings for each scale were given, discussing the evidence of each measurement property.

Discussion

The review highlights the overall limited methodological rigor of many studies validating AI literacy scales. The lack of open data and the absence of information on key quality indicators (e.g., missing data, floor/ceiling effects) hinder replicability and interpretation. Based on COSMIN priorities, recommendations are provided for choosing appropriate scales based on the target population (general population, higher education students, secondary education students, or teachers). For each population group, the scales with the most robust quality evidence are identified. Areas needing improvement, such as content validation, cross-cultural validity, and measurement error assessment, are discussed.

Conclusion

This systematic review identified a need for higher methodological rigor in the development and validation of AI literacy scales. The findings provide guidance for researchers and educators seeking reliable and valid instruments. Future research should focus on addressing the limitations identified, including improving the quality of scale validation studies, increasing the availability of open data, and developing more performance-based measures.

Limitations

The review was conducted by a single author, potentially introducing bias. The search was limited to Scopus and arXiv, possibly missing studies published in grey literature. However, a reverse search helped mitigate this risk. The absence of a gold standard for AI literacy assessment also affects the interpretation of criterion validity.

Related Publications

Explore these studies to deepen your understanding of the subject.

Computer Science

When combinations of humans and AI are useful: A systematic review and meta-analysis

M. Vaccaro, A. Almaatouq, et al.

Medicine and Health

Diagnostic Accuracy of Machine Learning AI Architectures in Detecting and Classifying Lung Cancer: A Systematic Review

A. Pacurari, S. Bhattarai, et al.

Food Science and Technology

A systematic review of the impacts of post-harvest handling on provitamin A, iron and zinc retention in seven biofortified crops

S. L. Huey, E. M. Konieczynski, et al.

Medicine and Health

Effects of vitamin D supplementation on cardiometabolic parameters among patients with metabolic syndrome: A systematic review and GRADE evidence synthesis of randomized controlled trials

S. Aquino, A. Cunha, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny