Medicine and Health
Spatio-temporal dynamics of three diseases caused by *Aedes*-borne arboviruses in Mexico
B. Dong, L. Khan, et al.
Aedes aegypti-transmitted arboviruses—chikungunya (CHIKV), dengue (DENV), and Zika (ZIKV)—pose a substantial and growing public health burden globally and across Latin America, including Mexico. DENV cases have increased markedly worldwide, and all three Aedes-borne diseases are reported in 57% of Mexican municipalities. Transmission potential varies by landscape, climate, and socio-economic development; factors such as temperature, rainfall, sanitation, and access to healthcare can modulate risk. This study hypothesizes that there are geographic clusters of CHIKV, DENV, and ZIKV in Mexico and that these clusters are associated with socio-economic and/or climatic parameters. The purpose is to detect spatiotemporal clusters and determine risk factors for these clusters using national, laboratory-confirmed case data (2012–2019) integrated with socio-demographic and climatic variables. Understanding these drivers can inform vector control programs and support predictive models for outbreak anticipation.
Prior work identifies temperature, rainfall, wind speed, and humidity as key drivers of mosquito development and arbovirus transmission, while socio-economic determinants (e.g., limited sanitation, water access, poverty, and barriers to healthcare) are also implicated. For DENV, increased temperatures (e.g., 23.8–33.1 °C) with short lags (1 week to 1–3 months) and greater rainfall (lags of weeks to months) have been associated with larger outbreaks; specific estimates include 1.3–2.1% more DENV cases per 1 mm rainfall increase at 2–3 week lags and 3.3% more cases per 1% rainfall increase in some settings. ZIKV prevalence has been linked to poor municipal water access, and both total rainfall and average temperature have predicted ZIKV infection. Some studies in Mexico highlight climatic drivers for ABDs, whereas others suggest socio-economic factors may dominate in certain regions. Despite Mexico’s high endemicity for CHIKV, DENV, and ZIKV, no prior national study integrated long time series of laboratory-confirmed cases with spatiotemporal socio-demographic and climatic data to jointly examine clustering and risk factors across the country.
Study area: Mexico comprises 32 states and 2469 municipalities, with diverse climates ranging from arid north to humid tropical south, providing varied ecological contexts for Aedes-borne diseases. Disease data: Daily, individual-level laboratory-confirmed CHIKV, DENV, and ZIKV cases (January 2012–December 2019) were compiled from state public health laboratories via the General Directorate of Epidemiology; data were de-identified and aggregated at the municipality level. Spatial data: Municipality centroids were created in ArcGIS (v10.7) in UTM projection, and altitude was calculated at municipality centers. Climate data: Monthly temperature (2 m surface air temperature) from NCEP CFSR and monthly precipitation from CHIRPS were obtained for 2012–2019. Daily mean, minimum, and maximum temperature and rainfall values were prepared as primary climatic parameters. Population, entomology, urban/rural, and socio-economic data: Socio-economic indicators (illiteracy; population without health services; houses with dirt floors; without toilet facilities; without water pipelines; without sewage; without electricity) from CONEVAL (census 2005 and 2015) were used. An ARIMA model projected socio-economic variables annually for 2012–2019. Population density and rural/urban classification were obtained from CONAPO (rural <10,000; urban ≥10,000 population). Presence points for Aedes aegypti and Aedes albopictus at municipality level (1993–2016) were compiled per national guidelines. Cluster detection and statistical analysis: Spatial clusters for CHIKV, DENV, and ZIKV were detected using SaTScan v9.6.1 with a discrete Poisson model, latitude/longitude coordinates, scanning for high-rate clusters, no geographic overlap, and Monte Carlo testing (9999 replications) to obtain simulated p-values. Standardized prevalence ratios were computed as observed/expected cases. Feature importance and dependence assessments: To evaluate stand-alone influence of socio-economic vs climatic factors on cluster prevalence, three methods were used: (1) Pearson correlation coefficients (linear association), (2) Randomized Dependence Coefficient (RDC; nonlinear dependence via copula-based projections), and (3) SHapley Additive exPlanations (SHAP) for feature importance derived from XGBoost models. For each method, feature importances were normalized and aggregated to compare overall socio-economic vs climatic impacts. Two summary metrics were used: majority voting (between SHAP, RDC, Pearson) and simple average across the three methods. Machine learning models and evaluation: Six classifiers were compared—XGBoost (gradient-boosted trees), decision tree, SVM with RBF kernel, KNN (k=5), random forest (n_estimators=6), and neural network (100 hidden units). Features included socio-economic variables, population density, urban/rural status, altitude, seasonality, Aedes species presence, and climate variables; outcomes were binary prevalence classes for CHIKV, DENV, and ZIKV. A threshold of 5 infected prevalence defined Class 0 (infected) vs Class 1 (normal). Ten-fold cross-validation was used with shuffled, non-overlapping folds; metrics reported included accuracy, weighted accuracy (to address class imbalance), precision, recall, and F1-score. Stratified analyses by urban vs rural were conducted to assess potential distributional biases and compare associations. Ethics: Approved by the ethical committee of Universidad de Sonora, Mexico; informed consent was waived due to aggregated, de-identified data.
- Prevalence across municipalities (2012–2019): DENV reported in 60.6% (1498/2469) of municipalities; CHIKV in 29.3% (723/2469); ZIKV in 31.2% (771/2469). Only 2.1% (52/2469) reported all three diseases, while 39.6% (978/2469) reported none. Total laboratory-confirmed cases: 224,701 DENV; 26,211 CHIKV; 12,813 ZIKV. Sixty-seven municipalities consistently reported >1% DENV prevalence; Tomatlán (Jalisco) had the highest DENV prevalence (2.48%). Veracruz experienced sharp increases in all three diseases. - Spatiotemporal clustering: Twenty-one statistically significant clusters (p=0.0001) identified nationwide: 12 DENV clusters, 6 ZIKV clusters, and 3 CHIKV clusters. - Drivers of clustering: Both socio-economic (SES) and climatic factors influenced prevalence, with heterogeneity across clusters. For DENV, climatic features dominated in clusters 1, 4, 5, 6, 7, and 12; for ZIKV, climate dominated in clusters 1, 2, 3, and 5, while SES dominated in clusters 4 and 6; for CHIKV, climate dominated in clusters 1–3. - Feature effects: Altitude and minimum rainfall had marginal influence on model output; average and maximum rainfall had greater importance than minimum rainfall. SHAP-weighted contributions across diseases indicated higher overall impact of socio-economic attributes (e.g., weighted SES SHAP value 0.61 vs climate 0.39 in a summarized ZIKV analysis). - Model performance: XGBoost outperformed baseline methods (decision tree, SVM, KNN, random forest, neural network) for predicting prevalence classes in both clusters and non-clusters, with higher precision and F1 and generally lower, more stable standard errors. Example (all clusters, DENV, XGBoost): accuracy ~0.91, weighted accuracy ~0.60, precision ~0.86, recall ~0.78, F1 ~0.81; non-clusters, DENV, XGBoost: accuracy ~0.98, weighted accuracy ~0.77, precision ~0.90, recall ~0.77, F1 ~0.84. Accuracy tended to exceed weighted accuracy and precision due to class imbalance. - Urban-rural stratification: Associations (e.g., temperature and population density with DENV) were slightly stronger in urban areas, but inferences were similar between urban and rural for CHIKV, DENV, and ZIKV.
The study confirms the presence of statistically significant geographic clusters of CHIKV, DENV, and ZIKV in Mexico and demonstrates that both climatic and socio-economic determinants contribute to transmission heterogeneity. The dominance of climatic versus socio-economic drivers varies by disease and cluster, reflecting diverse ecological and social contexts across Mexico. Positive associations of mean temperature with all three diseases align with biological plausibility and prior literature. Socio-economic vulnerabilities—such as lack of toilet facilities, water pipelines, sewage, and electricity, and higher illiteracy—were linked with higher prevalence, likely via increased container habitats and reduced capacity for prevention and care. Co-circulation following 2016 further complicates surveillance and control due to overlapping clinical presentations. The comparative machine learning framework, integrating SHAP, RDC, and Pearson analyses, helped delineate the relative contributions of SES and climate features and highlighted XGBoost’s predictive advantage, despite class imbalance. Slightly stronger associations observed in urban settings suggest that denser human populations and urban microclimates may amplify transmission potential. Overall, these findings address the research question by identifying where and when clusters occur and by quantifying the relative influence of key risk factors, supporting targeted interventions and the development of early warning systems.
This national-scale integration of laboratory-confirmed arboviral case data with socio-demographic and climatic variables identified 21 significant spatiotemporal clusters of CHIKV, DENV, and ZIKV in Mexico and demonstrated that both climate and socio-economic factors drive transmission with cluster-specific dominance. Machine learning models, particularly XGBoost, effectively characterized risk patterns, offering a basis for predictive analytics to inform surveillance and vector control. Future research should develop early warning systems that incorporate fine-scale microclimate, landscape ecology, urban environmental features, entomological data on Aedes distribution and human-vector contact, and household-level socio-demographic information to establish causal mechanisms and enhance spatial precision for targeted interventions.
- Use of municipality-level aggregated data may be too coarse to capture fine-scale socio-economic, environmental, and behavioral drivers; potential hidden confounders may remain. - Reliance on a passive surveillance system and inclusion of only laboratory-confirmed cases may underestimate true incidence and introduce reporting biases. - Class imbalance (low prevalence relative to normal) affects evaluation metrics; while addressed with weighted accuracy and other metrics, residual bias may persist. - Socio-economic variables were projected (ARIMA) between census years, introducing modeling uncertainty. - The spatial scan used non-overlapping, high-rate clusters, which may miss complex overlapping dynamics.
Related Publications
Explore these studies to deepen your understanding of the subject.

