Heatstroke, a severe heat-related illness, poses a significant public health threat, particularly with increasing extreme heat events due to climate change. Accurate prediction models are crucial for optimizing resource allocation (medical staff, ambulances) and informing citizens about daily heatstroke risks. While wet bulb globe temperature (WBGT) is commonly used, it has limitations in precisely stratifying risk and doesn't fully account for factors like high humidity affecting sweat evaporation. Existing models often lack the granularity to predict spikes in heatstroke cases, crucial for timely interventions. This study aimed to develop and validate prediction models for the number of heatstroke cases (all cases, hospital admissions, and deaths) per city per 12 hours in Japan, using multiple weather variables and a comprehensive population-based database from 16 cities (approximately 10,000,000 people). The goal was to create models with high enough predictive power for practical public health implementation.
Literature Review
Several studies have explored predicting heat-related illnesses using weather information. However, previous models often lacked the resolution of this study (12-hour intervals) and didn't stratify by heatstroke severity. The importance of predicting heatstroke spikes, rather than just average daily counts, has been largely understudied. This study builds upon previous work by using more detailed data and advanced machine learning techniques to improve predictive accuracy for both the overall number of cases and spikes in cases.
Methodology
This study used data from 16 Japanese cities between June and September 2015-2018. Data included heatstroke incidences (from a population-based registry) and high-resolution weather information (from the Weather Company). The dataset was split into training (2015-2017) and testing (2018) sets. Prediction models were developed for the number of all heatstroke cases and those requiring hospital admission or resulting in death, both per city per 12 hours. Several models were compared: generalized linear models (GLMs) using WBGT only, GLMs with multiple predictors, generalized additive models (GAMs) with multiple predictors, random forests, and extreme gradient boosting (XGBoost). Model performance was assessed using root mean squared error (RMSE), mean absolute percentage error (MAPE), and total absolute percentage error. For predicting spikes in heatstroke cases, under-sampling and bagging techniques were applied to the best performing model (XGBoost). Feature selection was done using recursive feature elimination (RFE). SHAP values were calculated to assess predictor importance. City-specific models were also developed and compared to the overall model.
Key Findings
The study found that machine learning models using multiple weather variables significantly outperformed models using only WBGT. The best performing model for predicting all heatstroke cases was a hybrid model combining a GAM and an under-sampled XGBoost model, achieving a MAPE of 14.8% in the testing dataset. For heatstrokes resulting in hospital admission or death, the best performing model was a GAM, which achieved a MAPE of 10.6% in the testing dataset. These models demonstrated improved accuracy in predicting spikes in heatstroke cases compared to simpler models. SHAP values revealed important predictors: high temperature, consecutive hot days, high solar radiation, and large population size (especially of the elderly) predicted higher heatstroke counts. Lower relative humidity and a high ratio of men to women were associated with fewer hospitalizations and deaths.
Discussion
The high predictive accuracy of the developed models, particularly for identifying heatstroke spikes, underscores their potential for practical public health applications. The integration of multiple weather variables beyond WBGT addresses its inherent limitations, resulting in more accurate risk assessments. These models can help optimize resource allocation in emergency medicine and public health settings. The findings highlight the importance of considering factors like consecutive hot days and population demographics in heatstroke prediction, improving our understanding of risk factors beyond simple temperature measures.
Conclusion
This study successfully developed highly accurate prediction models for heatstroke incidence, capable of identifying periods of increased risk. The use of multiple weather variables and advanced machine learning techniques enhanced predictive power. These models offer a valuable tool for improving public health preparedness and response to heatstroke events. Future research could focus on validating these models in other geographic areas and incorporating individual patient characteristics to further refine risk prediction.
Limitations
The study's limitations include the use of data from only 16 cities in western Japan, which might limit generalizability. The prediction accuracy depends on accurate weather data. The models didn't consider all patient characteristics, focusing on overall counts and severity levels rather than individual risk factors. The definitions of heatstroke may vary across different countries, potentially affecting the applicability of the models in international contexts.
Related Publications
Explore these studies to deepen your understanding of the subject.