logo
ResearchBunny Logo
Introduction
Medical textbooks and other educational materials used in medical school curricula often lack adequate representation of diverse skin tones in images illustrating skin diseases. This underrepresentation, particularly of Fitzpatrick skin tones (FST) V and VI (brown and black skin tones), contributes to racial inequalities in healthcare, leading to delayed diagnoses and increased morbidity and mortality for patients of color. Previous analyses have relied on manual annotation, a time-consuming and error-prone process. This study introduces STAR-ED, a machine learning framework designed to automate the assessment of skin tone representation in medical educational materials. This framework offers a scalable and objective solution to identify and quantify biases in skin tone representation, providing valuable insights for medical educators, publishers, and practitioners. Existing machine learning approaches for skin tone analysis have primarily focused on curated datasets and often relied on methods like individual typology angle (ITA), which are sensitive to lighting conditions. STAR-ED addresses these limitations by using a machine learning model trained to directly classify FST from skin images, offering improved accuracy and robustness.
Literature Review
Several studies have highlighted the underrepresentation of diverse skin tones in medical educational materials. Louie and Wilkes (2018) and Adelekun et al. (2021) manually evaluated commonly used medical textbooks and found a significant underrepresentation of FST V and VI. Lester et al. (2020) observed similar underrepresentation in published photos of COVID-19 cutaneous manifestations. These studies underscore the need for diverse representation in educational materials to improve the training of healthcare professionals and reduce racial disparities in diagnosis and treatment. However, the manual nature of these assessments limits scalability and introduces potential biases. Existing machine learning approaches for skin tone analysis in dermatology have been applied to curated datasets (e.g., ISIC 2018 and SD-19), but not to real-world academic materials. ITA-based methods, while previously used, have limitations due to sensitivity to lighting conditions. This research addresses these gaps by developing a robust and scalable machine learning framework for skin tone representation analysis in diverse educational materials.
Methodology
The STAR-ED framework comprises several stages: 1) **Document Ingestion:** The system utilizes the Corpus Conversion Service (CCS) to parse various document formats (e.g., .pdf, .pptx, .docx) and extract structured data, including images, text, and tables. 2) **Skin Image Selection:** An XGBoost classifier, trained on a dataset (DermEducation) of dermatology images, distinguishes skin images from non-skin images based on features like Histogram of Oriented Gradients (HOG) and statistical measures of CIE LAB color space channels. 3) **Skin Pixel Segmentation:** An intensity-based segmentation technique masks out non-skin pixels (background, foreground) in images identified as containing skin. 4) **Skin Tone Estimation:** A pre-trained ResNet-18 model, fine-tuned using the Fitzpatrick17k dataset, classifies skin tones into FST I-IV (light) and FST V-VI (dark). The DermEducation dataset, SegmentedSkin dataset (images from Wikimedia Commons with manual segmentation masks), and Fitzpatrick17k dataset were used for model training and validation. Four medical textbooks served as an external test set (Medical Textbooks dataset). Performance was evaluated using AUROC, F1 score, accuracy, precision, and recall.
Key Findings
STAR-ED demonstrated strong performance in skin image detection (0.96 ± 0.02 AUROC and 0.90 ± 0.06 F1 score) and skin tone classification (0.87 ± 0.01 AUROC and 0.91 ± 0.00 F1 score) across multiple datasets, including a rigorous external validation on four widely used medical textbooks. Analysis of these textbooks revealed a significant underrepresentation of brown and black skin tones (FST V-VI), consistently below 10.5% of all skin images. The STAR-ED pipeline outperformed traditional machine learning methods (e.g., Random Forest, AdaBoost) and an ITA-based approach, highlighting the benefit of direct FST classification from skin images. The model demonstrated robust performance across different textbooks, confirming its generalizability. The rapid processing speed of the STAR-ED framework—generating a bias assessment within minutes—is a significant improvement over manual annotation, which previously took over 100 person-hours.
Discussion
The findings of this study confirm the underrepresentation of darker skin tones in dermatology educational materials, corroborating previous manual assessments. STAR-ED provides a scalable and objective tool to address this issue, facilitating large-scale analysis of various types of educational materials beyond textbooks. The superior performance of the deep learning model (ResNet-18) compared to traditional machine learning methods and ITA-based approaches demonstrates the effectiveness of using raw pixel data for direct FST classification. The consistency of the results across different textbooks highlights the robustness and generalizability of the framework. The speed and efficiency of STAR-ED enable rapid and widespread assessment of bias in skin tone representation, offering a valuable tool for medical educators, publishers, and practitioners.
Conclusion
STAR-ED offers a novel, automated solution for assessing skin tone representation in medical educational materials. Its strong performance and efficiency make it a valuable tool for identifying and addressing biases in representation. Future work will involve piloting STAR-ED among various publishers and content creators, expanding its application to other domains, and refining the skin pixel segmentation and skin tone estimation modules for improved granularity.
Limitations
The current skin pixel segmentation method does not fully exclude diseased or lesional skin, which might affect skin tone estimation. The study focused on classifying skin tones into only two categories (FST I-IV and FST V-VI), potentially overlooking finer-grained differences within these groups. The accuracy of skin tone estimation from images is limited by factors like differences in color balancing across cameras and variations in lighting conditions. Although non-expert labelers showed good agreement with a subset of expert-labeled images, potential labeling biases cannot be entirely ruled out.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny