This study develops and validates prediction models for heatstroke cases (all cases, hospital admissions, and deaths) per city per 12 hours using weather data and a population-based database from 16 Japanese cities. Machine learning models, incorporating multiple weather variables and techniques like under-sampling and bagging, significantly improved prediction accuracy compared to models using only wet bulb globe temperature (WBGT). The optimal models achieved mean absolute percentage errors (MAPEs) of 14.8% for all heatstroke cases and 10.6% for hospital admissions and deaths, demonstrating sufficient accuracy for public health applications.