Education
Implications of the school-household network structure on SARS-CoV-2 transmission under school reopening strategies in England
J. D. Munday, K. Sherratt, et al.
In the wake of school closures due to the SARS-CoV-2 pandemic, researchers explore the risks of reopening educational institutions. This critical study reveals that while selective reopening of certain year-groups poses minimal risk, reopening secondary schools without adequate measures could potentially affect millions of households. This important work highlights the need for strict monitoring and effective infection control within schools. This research was conducted by James D. Munday, Katharine Sherratt, Sophie Meakin, Akira Endo, and others.
~3 min • Beginner • English
Introduction
School closures were implemented widely in early 2020 to curb SARS-CoV-2 transmission. As governments planned phased reopening, there was a need to quantify risks associated with returning specific year-groups to in-person schooling. This study constructs a national school-household contact network in England to evaluate potential transmission between schools via shared households under alternative reopening scenarios and varying within-school reproduction numbers (R). The aim is to estimate how different combinations of primary and secondary school years might lead to cross-school outbreaks and the scale of household involvement, informing policy on safe reopening and the importance of within-school controls.
Literature Review
Prior evidence indicates differences in transmission dynamics by age and school level. Outbreaks in primary schools have been reported to be smaller than those in secondary schools in the same areas, and older children may pose a greater risk of onward transmission than younger children. Some analyses suggested schools did not contribute greatly to overall epidemic growth prior to closures, while others documented reductions in epidemic growth following school closures. Since reopening in September, evidence on school-based transmission in the UK has been mixed, with passive surveillance likely underestimating school transmission due to lower symptom incidence in children. UK prevalence surveys have shown 11–18-year-olds often having high infection prevalence, and school-aged children have been estimated to be more likely than adults to introduce infection into households. These findings motivate assessing how reopening specific year-groups may differentially affect transmission between schools and households.
Methodology
Data: Individual-level de-identified records of pupils attending state-funded schools in England (September–December 2019) were provided by the UK Department for Education under a data-sharing agreement and ethics approval (LSHTM Ref: 22476). Data included school URN, school postcode, pupil postcode, and address. Households were defined by combining postcode and address; validation against official address codes (available for 53% of pupils) showed 99.8% of multi-pupil households correctly identified as single households and 0.2% erroneously merged. Boarders and non-main institutions were excluded.
Reopening scenarios: Six scenarios reflecting English policy were considered, defined by which year-groups attend in-person: (1) Reception, Year 1, Year 6; (2) add Year 10; (3) add Year 12; (4) add Years 10 and 12; (5) all primary years (Reception–Year 6); (6) all secondary years (Year 7–Year 13). A network was built per scenario using only pupils in included years.
Contact network between schools: Schools i and j are connected via shared households. For each household k with n_ik pupils attending school i and n_jk attending school j, unique contact pairs between i and j contributed n_ik × n_jk. The total number of unique contact pairs between schools i and j is C_ij = sum_k n_ik n_jk.
Transmission probability network: Transmission between schools is defined as an outbreak in one school seeding an outbreak in an adjacent school through within-household child-to-child transmission. Per contact pair, the transmission probability is approximated by P_ob × P_in^j × q, where P_ob is the probability of a school outbreak given one infection in the source school, P_in^j is the probability a child in the recipient school is infected during that school’s outbreak, and q is the per-contact probability of transmission between children in the same household. Across C_ij independent contact pairs, the school-to-school transmission probability is P_trans^{ji} ≈ 1 − (1 − P_ob P_in^j q)^{C_ij}.
Parameterization:
- Within-school reproduction number R varied from 1.1 to 1.5.
- Probability a single introduction leads to a school outbreak: P_ob = 1 − 1/R.
- Final size in a school (proportion infected) Z was used to approximate P_in^j ≈ 1 − exp(−Z), with Z determined by R (yielding final sizes roughly 18–58% across R=1.1–1.5, consistent with reported school outbreak sizes).
- Household child-to-child transmission probability q = 0.15, consistent with estimates of the secondary attack rate. Sensitivity analyses for q = 0.3 and q = 0.08 showed qualitative robustness of scenario rankings.
Binary outbreak networks and components: For each scenario and R, 1000 realisations of a binary outbreak network were sampled by setting each edge between schools i and j to 1 with probability P_trans^{ij} and 0 otherwise. Connected components of these networks represent sets of schools that could be involved in the same outbreak cluster. Component size was measured both by number of schools and by number of unique households with pupils attending schools in the component (restricted to included years per scenario).
Network metrics: The weighted degree of a school in the transmission probability network was defined as the expected number of other schools infected by that school via household links (sum of edge transmission probabilities). Distributions of weighted degree and connected component sizes were summarized across realisations for each scenario and R. Analyses used Python 3.7 and NetworkX 2.4. Code is publicly available at https://github.com/jdmunday/SchoolHouseholdNetworksCOVID (archived: https://doi.org/10.5281/zenodo.4552422).
Key Findings
- Reopening limited primary year-groups (Scenario 1: Reception, Year 1, Year 6) produced small largest connected components across R=1.1–1.5 (median 3–9 schools; 630–1,631 households). Very few realisations exceeded 10 schools; components typically represented fewer than 1,000 households.
- Adding a single secondary year (Scenario 2: add Year 10; Scenario 3: add Year 12) substantially increased connectivity at higher R. At R=1.5:
• Scenario 2: median largest component 171 schools (112–272) and 29,517 households (21,151–52,983).
• Scenario 3: median largest component 36 schools (22–71) and 7,245 households (4,402–13,766).
- Adding both secondary transition years (Scenario 4: Years 10 and 12 plus Reception, Year 1, Year 6) led to very large clusters at higher R. At R=1.5: median largest component 1,760 schools (1,544–2,228) and 327,433 households (291,536–403,243).
- Opening only primary schools (Scenario 5) yielded lower connectivity than including secondary years. At R=1.5: median largest component 418 schools (257–768) and 126,561 households (76,626–229,320).
- Opening only secondary schools (Scenario 6) generated the highest connectivity among partial reopenings. At R=1.5: median largest component 3,904 schools (3,658–3,998), encompassing 2,450,215 households (2,314,264–2,502,364), representing about 85% of schools and 93% of households.
- Even as R increased, the substantial majority of schools under Scenarios 1–5 remained in small components (<5 schools) at R=1.5 (counts provided in the paper for each scenario), indicating limited cross-school spread when secondary attendance is restricted.
- Weighted degree (expected number of adjacent schools infected) rose with R and with inclusion of secondary years, illustrating rapid growth in transmission potential as within-school R increases, particularly in networks involving secondary school years.
- Table 1 quantified the scale of attendance per scenario: e.g., Scenario 1 involved 17,953 schools (83%) and 1,728,173 households (37%); Scenarios including secondary years engaged a larger share of schools and households, with “All schools” involving 21,583 schools and 4,927,163 households.
Discussion
The analysis indicates that reopening a limited subset of primary year-groups carries relatively low risk of large-scale transmission between schools via shared households, whereas including secondary school years markedly increases network connectivity and the potential size of cross-school outbreak clusters. Even adding a single secondary transition year (Year 10 or 12) substantially elevates risk at higher within-school R, and adding both Years 10 and 12 can generate very large potential clusters. Opening only secondary schools yields the highest connectivity of all partial reopening scenarios.
These results align with empirical observations that outbreaks in primary schools tend to be smaller than in secondary schools, and that older children may have higher onward transmission risk. The findings underscore the critical role of within-school controls: the expected number of adjacent schools infected grows rapidly with the within-school reproduction number R, and higher R facilitates spread across connected components, reducing the window for effective reactive interventions.
The impact of R on the largest components suggests some parts of the school-household network are more tightly connected, implying that particular geographic or structural clusters could be disproportionately affected. Increased connectivity at higher R complicates targeted interventions, analogous to the challenges posed by pre-symptomatic transmission for contact tracing. The model presumes that risk within the school network scales with community prevalence; thus, the risks associated with school reopening would be expected to rise as community transmission increases.
Conclusion
Using nationally comprehensive state-school and household address data, the study constructs a school-household contact network to quantify how different reopening strategies might seed cross-school transmission via shared households. The principal insight is that reopening primary years entails comparatively low cross-school transmission risk, while inclusion of secondary years—especially multiple transition years—greatly amplifies potential outbreak cluster sizes, both in numbers of schools and affected households. Maintaining a low within-school R through infection control (e.g., distancing, masking, cohorting) is likely to be highly influential in limiting inter-school spread.
The framework provides a basis for evaluating and prioritizing reopening strategies and for designing reactive interventions (e.g., targeted school or class closures) should outbreaks be detected. Future research could incorporate explicit within-school contact structures (years, classes, bubbles), heterogeneity in R between schools, dynamic community prevalence, and more detailed intervention modeling to produce operational projections and guide targeted responses. The approach is generalizable to other pathogens for which children contribute substantially to transmission.
Limitations
- Data coverage: Only state-funded schools in England were included; independent schools (≈7% of pupils) were excluded, potentially underestimating network size and connectivity.
- Contact scope: The network captures transmission via schools and households among school-aged children; mixing between children from different schools outside these contexts was not modeled.
- Homogeneous mixing within schools: Within-school contact structure (year/class/gender or “bubble” arrangements) was not modeled; assuming well-mixed populations may overestimate outbreak final sizes.
- Unmitigated final sizes: School outbreaks were assumed to reach theoretical unmitigated final size, not accounting for reactive measures (e.g., class/school closure, testing, isolation) that could limit spread.
- Uniform parameters: The within-school reproduction number R and household child-to-child transmission probability q were assumed uniform across schools; true values likely vary by school and region.
- Immunity: The main analysis assumed no prior immunity; sensitivity analyses varying q altered component sizes but preserved qualitative scenario rankings. Limited evidence on immunity among children and its heterogeneity could affect results.
- Dependence on community prevalence: The framework assumes outbreak seeding risk is proportional to community prevalence; results should be interpreted accordingly and are not forecasts.
Overall, results represent upper-bound or maximal risk scenarios in the absence of mitigation and with simplified mixing assumptions.
Related Publications
Explore these studies to deepen your understanding of the subject.

