Medicine and Health

Recent Advancements and Perspectives in the Diagnosis of Skin Diseases Using Machine Learning and Deep Learning: A Review

J. Zhang, F. Zhong, et al.

Explore the cutting-edge advancements in machine learning and deep learning for diagnosing skin diseases! This review addresses current challenges and offers innovative solutions for future research conducted by Junpeng Zhang, Fan Zhong, Kaiqiao He, Mengqi Ji, Shuli Li, and Chunying Li.

00:00

~3 min • Beginner • English

Index

Introduction

The paper addresses the growing burden of skin diseases and the limitations of traditional, experience-based visual diagnosis, which can lack objective criteria and is constrained by access to dermatologists. It explores how AI—specifically machine learning (ML) and deep learning (DL)—can improve diagnostic accuracy, especially for early-stage disease, and serve as decision support for both dermatologists and non-specialists. The research question is to comprehensively review recent advances (with emphasis on the last five years) in ML/DL for dermatological image segmentation and classification, identify current challenges (data availability/diversity, generalizability, interpretability, and topic focus), and propose directions to advance computer-aided diagnosis (CAD) for skin diseases.

Literature Review

The review summarizes prior work on skin image segmentation and classification using ML and DL, highlighting the increasingly dominant role of CNN-based DL architectures (e.g., U-Net, FCN, DeepLabv3+, DenseNets) over traditional ML methods (e.g., thresholding, K-means, ICA/FCM, Random Forests, SVMs). It details common datasets employed in dermatology research, including DermNet, MED-NODE, DermIS, ISIC 2017/2018/2019/2020, and Derm7pt, noting typical class compositions and imbalances (e.g., ISIC 2018 seven classes; ISIC 2019 eight classes plus outliers). The literature indicates DL’s superior performance in both segmentation (higher Jaccard/IoU and Dice) and classification (higher accuracy/sensitivity/specificity), yet underscores variability due to task design (binary vs multi-class), imaging modality (dermoscopy, clinical, RCM, VHF ultrasound), region of interest (face vs smooth skin), and dataset size/quality. The review also identifies a strong concentration of studies on melanoma/skin cancer, with relatively fewer works on pigmented disorders such as vitiligo, and discusses emerging trends including transfer learning, attention mechanisms, and multimodal fusion of images with clinical metadata.

Methodology

A systematic literature review was conducted across PubMed, IEEE, SpringerLink, and Web of Science, including only original, English-language journal articles that proposed segmentation and/or classification algorithms for binary or multi-class skin lesions using ML or DL. Exclusions were reviews, case reports, books, and outdated literature. Following PRISMA guidelines, the initial search identified 157,036 records plus 5,287 from snowballing; after deduplication, 131,985 remained. Applying inclusion criteria yielded 1,197 full-text articles screened within the 2015–2023 timeframe, emphasizing the most recent five years. From these, 29 segmentation-focused and 45 classification-focused articles were selected. Studies were categorized into traditional ML vs DL approaches, and a comparative analysis was performed on methods and outcomes. Evaluation metrics considered included accuracy (AC), sensitivity (SE), specificity (SP), Dice (DI), and Jaccard/IoU (JA), with task-dependent prioritization (e.g., JA/DI and SE in segmentation; AC, SE, SP in classification). Representative datasets and imaging modalities (dermoscopy, RCM, VHF ultrasound, clinical images) were documented to contextualize performance.

Key Findings

- Deep learning generally outperforms traditional machine learning for both segmentation and classification of dermatological images. - Segmentation: U-Net variants, FCNs, DeepLabv3+, MSFCND, and DPFCN consistently achieve strong pixel-level performance. Reported Jaccard/IoU values for DL methods often exceed those of ML (e.g., traditional ML Jaccard typically ≤0.81 versus DL average ≈0.821, with top methods reaching ≈0.94). Examples include: U-Net-based methods achieving JA up to 0.88–0.94; SEDCIS reporting JA 0.94 and DI 0.97; DeepLabv3+ systems achieving high detection rates in small datasets; enhanced U-Net with optimization achieving AC ≈98%. - Classification: CNN-based and hybrid DL models (e.g., ResNet, Inception/GoogleNet, DenseNet, EfficientNet variants, Visual Transformers, InSiNet, RDCNN) achieve high accuracy and robustness across tasks. Examples include: GoogleNet achieving AC ≈99.29% (binary); AlexNet transfer learning achieving AC ≈98.7% (seven-class); InSiNet AC ≈94.59% (binary); DenseNet/ResNet hybrids ≈95.1% (seven-class); fusion of U-Net and CNN ≈97.96% (seven-class). Some works report very high accuracy on small datasets (e.g., BPNN 99.7% on 400 images), underscoring the need for validation on larger, diverse cohorts. - Emerging directions: Transfer learning, attention mechanisms (e.g., ECA in Eff2Net), interpretability via Grad-CAM, and multimodal fusion (transformers combining images and clinical metadata) improve performance and transparency. - Gaps: Research is heavily skewed toward melanoma/skin cancer; pigmented diseases such as vitiligo are underrepresented though initial DL-based systems show promise (e.g., ResNet50/VGG16/Inception v2 color-space ensembles; CycleGAN-based preprocessing; YOLOv3+UNet++ hybrids; LVQ approaches reporting AC >85–92%). - Persistent challenges: Dataset size/diversity and class imbalance, limited generalizability across demographics/modalities, and the black-box nature of DL models hinder clinical translation.

Discussion

The review’s findings support the hypothesis that DL has become the dominant and more effective paradigm for dermatological image analysis, delivering superior segmentation (higher JA/DI) and classification (higher AC/SE/SP) compared with traditional ML. This directly addresses the need for more objective and accurate tools in dermatology. However, the gains are tempered by significant barriers: heterogeneous tasks and datasets make cross-study comparisons difficult, dataset imbalances and limited representation of non-melanoma conditions restrict model generalizability, and the lack of interpretability impedes clinical trust and adoption. The results highlight the importance of standardized, diverse datasets, interpretable modeling (e.g., saliency/heatmaps, checklists like ABCDE and seven-point), and methodological innovation (e.g., transformers, reinforcement learning, multimodal fusion) to ensure robustness, fairness, and clinical usefulness. Expanding beyond melanoma to broader dermatologic conditions and leveraging clinical images (including smartphone-acquired) will better align research with real-world practice.

Conclusion

This review synthesizes advances in ML and DL for skin lesion segmentation and classification, demonstrating DL’s superior performance and growing clinical potential. It identifies critical challenges—limited and imbalanced datasets, generalizability gaps across populations/modalities, and insufficient interpretability—and outlines practical future directions: (1) establish larger, standardized, demographically diverse datasets with quality annotations; (2) develop explainable AI tools (e.g., heatmaps, rule-based rationales) to enhance transparency and trust; (3) innovate with modern architectures (e.g., Swin Transformers), transfer learning, and reinforcement learning; (4) fuse multimodal data (images plus clinical metadata, history, and close-up images); and (5) broaden research focus beyond melanoma to include inflammatory and pigmented diseases and more clinical (non-dermoscopic) images. These steps are essential to build reliable, generalizable, and interpretable CAD systems for dermatology.

Limitations

As a review, findings are constrained by the heterogeneity of included studies (varying tasks: binary vs multi-class; differing imaging modalities and regions; diverse datasets and sizes), making direct performance comparisons imperfect. Many high-reported accuracies derive from small or imbalanced datasets, limiting generalizability. The literature is skewed toward melanoma/skin cancer, with fewer studies on pigmented diseases such as vitiligo, which reduces the breadth of conclusions for those conditions. Publication bias and the emphasis on recent years (2015–2023) may omit earlier or non-English works. Differences in preprocessing, segmentation dependence, and evaluation protocols further affect comparability.

Related Publications

Explore these studies to deepen your understanding of the subject.

Computer Science

Using the interest theory of rights and Hohfeldian taxonomy to address a gap in machine learning methods for legal document analysis

A. Izzidien

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Psychology

Building machine learning prediction models for well-being using predictors from the exposome and genome in a population cohort

D. H. M. Pelt, P. C. Habets, et al.

Medicine and Health

Advances in Photodynamic Therapy for the Treatment of Actinic Keratosis and Nonmelanoma Skin Cancer: A Narrative Review

A. S. Farberg, W. Justin, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny