This study investigated the influence of High Variability Pronunciation Training (HVPT) with and without captions on the accuracy of English diphthong pronunciations among Saudi EFL learners. 56 undergraduate EFL learners participated, undergoing HV and LV pronunciation training. Assessments included pre-tests, post-tests, generalization tests, and delayed tests, along with a survey on perceptions of YouGlish. Findings indicate that both HV and LV improved pronunciation, with LV without captions yielding the highest scores. Students had positive perceptions of YouGlish.