Medicine and Health
Mpox Detection Advanced: Rapid Epidemic Response Through Synthetic Data
Y. Kularathne, P. Janitha, et al.
The dependency on large, extensive datasets significantly impeded the development and training of computer vision models for identifying visual symptoms of new diseases. This necessity for vast amounts of data becomes a substantial obstacle, especially during emergent health crises such as novel endemics, pandemics, or bioterrorism attacks, where acquiring relevant data is inherently challenging. Conventional methods of data collection and compilation are not only slow but often inadequate to keep pace with the swift emergence and identification of new diseases. Delays in developing detection models can lead to widespread disease and increased pressure on healthcare infrastructures. To address this issue, this paper introduces a novel approach, SynthVision, which utilizes diffusion models for the generation of synthetic medical images based on a minimal set of initial guide images. Traditionally, synthetic image creation involved Generative Adversarial Networks (GANs). For instance, the research by Aljohani & Alharbe (2022) employed Deep Pix2Pix GAN to generate synthetic medical images, affirming the potential of GANs to create realistic and clinically relevant images, a principle that aligns with our SynthVision initiative. However, GANs often face criticism for their limited diversity in outputs and their unstable training processes. Moreover, they require large datasets, which are typically unavailable in medical crisis situations. Diffusion probabilistic models represent a significant advancement in the field of computer vision, with proven efficacy in generating high-quality images. Research showed that these models could produce high-quality medical imaging data, such as MRI and CT scans, and notably enhance the performance of breast segmentation models under conditions of data scarcity. This capability is essential for emerging diseases where data is scarce and poses significant challenges. The images generated via diffusion models are both diverse and of high fidelity, making them highly effective for training robust vision models. Recent developments in Denoising Diffusion Probabilistic Models (DDPMs), highlighted by Nichol and Dhariwal's work, have marked a considerable improvement in the capabilities of diffusion models. Their methods demonstrated efficient image generation processes, yielding high-quality results with optimized computational demands. The process effectively reverses diffusion, converting noise back into meaningful images, which supports our project's objective to maximize output from minimal inputs in vision model development. Additionally, Ceritli et al. (2023) explored the utility of diffusion models in creating realistic synthetic mixed-type Electronic Health Records (EHRs), showcasing their benefit in producing more accurate synthetic data. This study highlights the flexibility of diffusion models to handle various data types, an essential attribute in healthcare where diverse data is crucial. In conclusion, this paper aims to showcase the effectiveness of diffusion models in synthesizing images for healthcare, emphasizing their pivotal role in addressing the challenges of sparse data availability. This method promises to enrich the datasets necessary for training effective vision models, thereby improving diagnostic precision and enhancing patient care amidst rapidly evolving medical emergencies.
The paper situates SynthVision within prior work on synthetic data generation for healthcare imaging. Earlier approaches relied on GANs, such as Deep Pix2Pix GAN (Aljohani & Alharbe, 2022), demonstrating feasibility but facing limitations like mode collapse, limited diversity, unstable training, and large data requirements. Diffusion probabilistic models are presented as a superior alternative for data-scarce settings, with evidence that they can generate high-fidelity medical images (e.g., MRI/CT) and improve downstream tasks under data scarcity. Advances in DDPMs (Nichol & Dhariwal) improved efficiency and image quality via better noise-to-image inversion. Diffusion models have also been extended to mixed-type healthcare data (e.g., synthetic EHRs; Ceritli et al., 2023), underscoring their flexibility. Surveys of diffusion models in medical imaging further support their applicability and performance advantages over GANs in clinical contexts.
The study proposes SynthVision, a pipeline to rapidly develop a Mpox lesion detection model primarily from synthetic images generated by personalized text-to-image diffusion models. Computing: All training (diffusion and classifier) used Google Colab with NVIDIA A100 GPUs. Data curation and personalization (2.1): Eight distinct sets of clinically validated Mpox lesion images were curated, each set containing 15 images (total 120 images), covering multiple body parts (face, back, leg, neck, arm) and three Fitzpatrick-based skin tone groupings: fair (types 1–2), brown (types 3–4), and dark (types 5–6). Images were formatted and annotated with detailed descriptive prompts, including a unique identifier for the Mpox class and explicit clinical descriptors to guide model learning. Diffusion model fine-tuning and generation (2.2): State-of-the-art diffusion models (SDXL and SD v2) were fine-tuned with DreamBooth to personalize generation toward Mpox lesion characteristics. From the fine-tuned model, 200 images were generated per validated set (1,600 total). Under clinical guidance, 100–150 images were selected from each set to construct a curated synthetic Mpox dataset of 1,000 images. SDXL’s U-Net-based stochastic process was used to transform noise into high-fidelity, diverse outputs. Classification dataset construction (2.3–2.4): The training dataset comprised three balanced classes: Mpox (1,000 synthetic images), Normal (1,000 real patient images), and Other skin disorders (1,000 real patient images), all sourced with patient consent. Validation used 150 real images per class (450 total). Testing used 100 real images per class (300 total), curated by physicians to capture variations and severity. Model architecture (2.5–2.6): A Vision Transformer (ViT) adapted for 16×16 patches and 384×384 input resolution was employed, improving image analysis compared to standard 224×224 inputs. Two enhancements were added: attention dropout to improve focus and generalization, and an additional dense layer with 128 neurons to increase capacity for complex pattern recognition. The model’s embedding and transformer stack follow a standard ViT with positional embeddings and class token. The final classifier maps to the target classes. Preprocessing and augmentation used Keras ImageDataGenerator, including rescaling, rotation, and brightness adjustments. Hyperparameters and training (2.7): After tuning, optimal settings were batch size 32, learning rate 1e-4 with Adam optimizer, dynamic learning-rate reduction on plateau, and early stopping to prevent overfitting. The configuration is designed to extract localized features via patches while leveraging higher-resolution inputs for improved performance.
- Synthetic image generation: 1,600 images were generated across 8 personalized sets using DreamBooth-fine-tuned SDXL/SD v2; 1,000 high-quality synthetic Mpox images were clinically selected covering multiple body parts and Fitzpatrick skin tones (fair, brown, dark). - Dataset balance: Training used 3,000 images (1,000 per class: Mpox synthetic, Normal real, Other real); validation included 450 real images (150 per class); testing included 300 real images (100 per class). - Classification performance on the 300-image real test set: • M-pox: 96 true positives (out of 100); precision 0.96; recall 0.96; F1-score 0.96. • Normal: 98 true positives; precision 0.97; recall 0.98; F1-score 0.98. • Other: 96 true positives; precision 0.97; recall 0.96; F1-score 0.96. • Overall accuracy: 0.97. • Confusion matrix counts: M-pox predicted as M-pox 96, Normal 1, Other 3; Normal predicted as M-pox 2, Normal 98, Other 0; Other predicted as M-pox 2, Normal 2, Other 96. - Model design choices (ViT with 384×384 input, attention dropout, extra dense layer) and training strategy (Adam, LR scheduling, early stopping, augmentation) contributed to robust generalization to real patient images despite synthetic-heavy training for the Mpox class.
The study addresses the core challenge of rapidly developing reliable diagnostic models when real-world data are scarce, as in emerging outbreaks. By fine-tuning diffusion models to generate clinically realistic, diverse Mpox images and integrating them with real images of normal and other skin conditions, the approach enabled training a balanced classifier that generalized well to real test data. The high per-class precision, recall, and F1-scores (≥0.96 for all classes) and overall accuracy of 0.97 demonstrate that synthetic data can effectively substitute or augment limited clinical datasets to deliver timely, accurate detection. The results indicate that diffusion-generated images captured salient Mpox characteristics across body sites and skin tones, reducing false positives and negatives. Architectural adaptations (higher input resolution, attention dropout, added dense layer) and careful hyperparameter tuning further improved performance, suggesting that model capacity and input detail are important for dermatological lesion classification. Collectively, these findings support the viability of SynthVision as a rapid-response methodology for novel disease detection tasks, potentially accelerating deployment in clinical workflows during epidemics or bioterrorism events.
In conclusion, our study demonstrates the transformative impact of synthetic data on the development of medical diagnostic models, particularly in scenarios demanding rapid and accurate disease detection. Through SynthVision approach, leveraging advanced diffusion models to generate synthetic images, we achieved remarkable diagnostic accuracy of 97% - precision, recall, and F1-scores consistently above 0.96 in identifying Mpox, normal, and other skin conditions. These results affirm the capability of synthetic data to not only enhance model training but also ensure the model's reliability in clinical settings, paving the way for its adoption in urgent healthcare responses where traditional data acquisition might be too slow. This pioneering approach sets a new benchmark for deploying artificial intelligence effectively in the fight against epidemics and other medical emergencies.
Related Publications
Explore these studies to deepen your understanding of the subject.

