Medicine and Health

Generating synthetic population for simulating the spatiotemporal dynamics of epidemics

K. Zhu, L. Yin, et al.

This research conducted by Kemin Zhu, Ling Yin, Kang Liu, Junli Liu, Yepeng Shi, Xuan Li, Hongyang Zou, and Huibin Du reveals a groundbreaking approach to generating synthetic populations for epidemic modeling. With over 17 million agents representing Shenzhen, China, it shows how realistic population data can dramatically influence epidemic projections, including peak incidence rates. Discover how innovative techniques can enhance our understanding of infectious disease spread!

00:00

Playback language: English

Index

Introduction

Agent-based microsimulation models are increasingly important in epidemic modeling, offering detailed insights into disease transmission dynamics compared to aggregated models. These models rely on synthetic populations when comprehensive census data is unavailable due to confidentiality or cost concerns. Population synthesis aims to generate a complete population from a smaller sample, such as household travel survey data. Existing methods, while considering household structures, often oversimplify interdependencies among household members, potentially biasing disease transmission simulations. This paper proposes a novel framework that addresses this limitation by integrating micro-sample-derived household structures and using a heuristic combinatorial optimizer to accurately represent the relationships between individuals within households. This method is then applied to generate a spatially explicit synthetic population for Shenzhen, China.

Literature Review

The literature review covers population synthesis methods and their application in epidemic simulation. Iterative Proportional Fitting (IPF) is a widely used method but has limitations, including the 'zero-cell problem' and computational burden with many attributes. Improvements like Iterative Proportional Updating aim to match household and individual attributes simultaneously. Population synthesis approaches are categorized into Synthetic Reconstruction (SR) and Combinatorial Optimization (CO). CO methods, often used for population synthesis, replicate existing agents from microdata, while SR leverages detailed and summarized data to reconstruct individuals. The review highlights the limitations of existing methods in accurately representing household structures and interdependencies among members, which are crucial for accurate epidemic simulations. Existing epidemic simulation models often utilize open-source tools initially designed for other applications, like transportation planning, but these can be computationally intensive and may not adequately capture household structures.

Methodology

The proposed framework for population synthesis consists of two stages: motif selection and optimization. The first stage involves preprocessing and encoding household structures from survey data. Individuals are categorized by age and gender, and household structures are encoded by counting the number of individuals in each category. To manage the vast number of potential household structures, the top S most frequent structures (motifs) are selected, ensuring a representative threshold α is met (ΣP(HSᵢ) ≥ α). The optimization phase uses a heuristic combinatorial optimization algorithm (Motif Heuristic Optimization or MHO) to adjust the weights of these selected motifs to match marginal attribute distributions from census data at both city and subzone levels. The MHO algorithm minimizes the discrepancy between the simulated and observed distributions using a mean squared error objective function, incorporating a penalty term to maintain consistency with the initial motif distribution from the survey data. A trust region-reflective optimizer is used to solve this bound-constrained nonlinear minimization problem. The synthetic population is then generated by sampling with weights from the optimization as choice probabilities.

Key Findings

A synthetic population of 17.37 million agents was generated for Shenzhen, China, using household survey data and census data. The method successfully matched marginal distributions at both city and subzone levels. Figures show the spatial distribution of different age groups in the synthetic population, demonstrating high consistency with the real data (R² > 0.99). Analysis of the marginal and joint distributions (age and gender) of the synthetic population also showed high accuracy compared to the census data. The MHO method outperformed benchmark methods (Direct Inflating and Iterative Proportional Fitting) in capturing the distribution of household motifs and cross-age interdependencies within households. An agent-based SEIR model showed that while the attack rate was consistent across different synthetic populations, the peak incidence rate and peak date varied, highlighting the impact of household structure representation on epidemic dynamics.

Discussion

The findings demonstrate the effectiveness of the proposed MHO method in generating synthetic populations that accurately reflect both marginal distributions and the complex interdependencies within households. This accuracy is crucial for reliable epidemic simulations, as demonstrated by the variations in peak incidence rate and timing observed when using different synthetic populations. The power-law distribution observed in household structures justifies the motif-selection approach, making the method computationally feasible for large populations. The study highlights the importance of accurately modeling household structures and their influence on epidemic dynamics. The method's adaptability to incorporate other attributes beyond age and gender makes it a valuable tool for future research.

Conclusion

This study introduced a novel MHO method for generating synthetic populations for epidemic simulations. The method effectively captures household structures and interdependencies, improving the accuracy of epidemic projections. Future research could focus on incorporating additional attributes like immunity or socioeconomic factors and evaluating performance on diverse datasets with a greater variety of household structures.

Limitations

The study focused primarily on age-related factors due to data availability. The model did not incorporate other factors, such as income, which might influence disease transmission. The computational performance with a larger number of attributes requires further investigation. The representativeness of the survey data used as input to the model also impacts the reliability of the synthetic population. Further testing on a wider range of datasets is necessary to evaluate the generalizability of the findings.

Related Publications

Explore these studies to deepen your understanding of the subject.

Psychology

Spatiotemporal dynamics of hippocampal-cortical networks underlying the unique phenomenological properties of trauma-related intrusive memories

K. J. Clancy, Q. Devignes, et al.

Education

Not-for-profit or for-profit? Research on the high-quality development path of private universities in China based on system dynamics

S. Duan, H. Yang, et al.

Political Science

The language of crisis: spatiotemporal effects of COVID-19 pandemic dynamics on health crisis communications by political leaders

B. J. Mandl and B. Y. Reis

Chemistry

Simulating the ghost: quantum dynamics of the solvated electron

J. Lan, V. Kapil, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny