logo
ResearchBunny Logo
Identification of temporal condition patterns associated with pediatric obesity incidence using sequence mining and big data

Medicine and Health

Identification of temporal condition patterns associated with pediatric obesity incidence using sequence mining and big data

E. A. Campbell, T. Qian, et al.

This groundbreaking study conducted by Elizabeth A. Campbell, Ting Qian, Jeffrey M. Miller, Ellen J. Bass, and Aaron J. Masino explores crucial temporal patterns linked to pediatric obesity using EHR data. It uncovers strong associations between pre-obesity conditions like asthma and allergic rhinitis, suggesting these could be early indicators of obesity.

00:00
00:00
~3 min • Beginner • English
Introduction
Childhood obesity is a major public health issue in the United States. In 2016, approximately 35% of children and adolescents ages 2–19 years were overweight or obese, with about half of these children being obese. Pediatric obesity elevates risks for numerous comorbidities including diabetes, hypertension, sleep apnea, and psychological issues in childhood and later life. Electronic health records (EHRs) have the potential to support childhood obesity diagnosis, treatment, and surveillance at clinical and population levels. EHR-derived data support surveillance of obesity and associated comorbidities, enable large, diverse cohorts for population studies, and can be combined with community-level environmental data for comprehensive studies. Prior research has used EHRs for obesity diagnosis and quality improvement in clinical settings, to estimate prevalence and demographics of childhood obesity and comorbidities, and alongside other data sources to study environmental influences. However, limited research has examined temporal dependencies of conditions associated with childhood obesity incidence. Understanding temporal condition patterns is important because they may signal impending obesity or conditions likely to follow obesity incidence, informing care and policy. The objective was to identify temporally ordered condition patterns surrounding childhood obesity incidence by examining sequences of conditions recorded at visits immediately before, during, and after the first recorded obese BMI. This study applies the SPADE sequence mining algorithm to a large retrospective cohort to: (1) identify common temporal condition sequences surrounding pediatric obesity incidence and conditions more prevalent before or after incidence; and (2) determine if these patterns occur at significantly different prevalence in patients with obesity compared with matched healthy-weight patients.
Literature Review
Methodology
Study design and setting: Retrospective matched case–control study using the Pediatric Big Data (PBD) resource at the Children's Hospital of Philadelphia (CHOP), encompassing CHOP, its primary care network (>30 sites), and specialty/surgical centers. Data followed OHDSI condition domain standards and included clinical and non-clinical observations. Identifiers were removed prior to analysis; dates were removed after labeling pre-index, index, and post-index visits. IRB approval was obtained with consent waived. Inclusion criteria for cases: CDC-defined childhood obesity (BMI z-score ≥ 95th percentile for age/sex). Patients had at least one obesity measurement during a CHOP primary care visit (index visit) and at least one prior visit without an obese BMI. Patients were 2–18 years old at the index visit, which occurred between Jan 1, 2009 and Dec 31, 2016. Height/weight measurements on the same day were paired to compute BMI; biologically implausible values were excluded per CDC guidelines. The index BMI z-score had to be biologically plausible. Included encounter types were inpatient, ambulatory, or emergency department face-to-face. Patients without a pre-index visit were excluded. Visit selection for sequence analysis: For each case, the most recent prior visit (pre-index) and earliest subsequent visit (post-index), if applicable, between Jan 1, 2005 and Dec 31, 2017 were selected. All visits required at least one recorded clinical finding (per OHDSI condition domain). Conditions were represented by ICD-9-CM and ICD-10-CM codes. After identifying pre-index, index, and post-index visits and their separations in days, explicit date information was not used further. Case cohort and visit timing: The final case dataset comprised 397,337 observations for 49,694 patients: 33.4% during pre-index (n=132,786), 40.8% during index (n=161,944), and 25.8% during post-index (n=102,607) visits. About two-thirds (n=33,839) had at least one prior non-obese BMI; about one-third (n=15,660) had a non-obese BMI at the pre-index visit. Time between pre-index and index: mean 303.6 days (SD 462.8), median 125 days. Time between index and post-index: mean 147.8 days (SD 246.3), median 49 days. Over 80% of pre-/post-index observations occurred within one year of the index. Matched control population: Controls were children with at least one healthy BMI (5th–84th percentile) between 2009–2016 and no recorded unhealthy BMI. From 343,998 eligible patients with 4,936,503 visits with clinical conditions, exclusions removed single-visit controls and visits with ages outside ±180 days of the youngest/oldest case index ages, and visits with exceptionally high prior visit counts. The final control pool included 3,622,341 potential visits from 296,751 patients. For each case, one control was matched using R's matchControls by sex, number of prior visits (proxy for utilization/health status), and age at the matched visit (within 60 days). Youngest cases were matched first; once a control was matched, their other visits were removed. All clinical observations from controls’ matching visits and adjacent visits (pre- and post-, if applicable) were extracted. All controls had pre-index and index visits (n=49,694); 89% had a post-index visit (n=44,208). Matching quality: mean age difference 0.13 days (SD 1.65; median 0); difference in number of prior visits mean 0.34 (SD 4.09; median 0). Control visit settings: 92.1% outpatient, 1.2% inpatient, 5.2% emergency room. Demographics and visit settings (cases): Majority male (55.3%). Race: 49.4% White, 34.7% Black/African-American, 2.3% Asian; 8.8% Hispanic ethnicity. Insurance at index: 57.1% private/commercial, 38.6% Medicaid/CHIP. Age at index: 30.5% aged 2–4, 42.9% aged 5–11, 26.6% aged 12–18. Case visit settings: 90.1% outpatient, 8.7% emergency room, 1.2% inpatient. Sequence mining and statistics: SPADE sequential pattern mining (R arules package) was applied to case clinical data with minimum support of 0.01 to identify condition items and multi-visit sequences spanning pre-index, index, and post-index timing classes. The control data were then analyzed to measure prevalence of the case-derived patterns in controls. Statistical comparison of paired case–control pattern prevalence used pairwise McNemar’s tests. SPADE runtime was on the order of seconds on standard hardware.
Key Findings
- SPADE identified 163 temporal condition patterns present in at least 1% of pediatric obesity cases. - Of these, 80 patterns were significantly more common among cases and 45 were significantly more common among controls (p < 0.05, McNemar’s tests). - Asthma and allergic rhinitis were strongly associated with childhood obesity incidence, with particularly high prevalence during pre-index and index visits among cases. - Seven conditions were commonly diagnosed exclusively during pre-index visits for cases, including ear, nose, and throat disorders and gastroenteritis, suggesting potential early indicators prior to recorded obesity incidence. - Descriptive context: The study analyzed 49,694 cases with 397,337 observations across pre-index, index, and post-index visits; over 80% of pre-/post-index observations occurred within one year of the index visit.
Discussion
The study addressed its objective by mining EHR-derived temporal sequences around the first recorded obese BMI and comparing them to matched healthy-weight controls. The identification of 163 patterns—of which many were differentially prevalent between cases and controls—demonstrates that temporally ordered condition trajectories are associated with pediatric obesity incidence. The strong pre-index and index associations of asthma and allergic rhinitis, along with several conditions observed exclusively in pre-index visits (e.g., ENT disorders and gastroenteritis), suggest potential early clinical signals that precede documented obesity. These findings are relevant for clinical practice and population health: recognizing temporally patterned comorbidities may help clinicians monitor at-risk children and tailor anticipatory guidance. For policy and prevention, such patterns could inform timing and targets for interventions. While the analysis reveals associations rather than causation, the temporal patterns provide hypotheses for future studies to test causal pathways and mechanisms linking these conditions with obesity onset.
Conclusion
Applying the SPADE sequence mining algorithm to a large pediatric EHR dataset revealed temporally dependent condition patterns surrounding obesity incidence. Asthma and allergic rhinitis were notably prevalent before and at the time of documented obesity, and several conditions appeared exclusively in pre-index visits, potentially signaling future obesity. These temporal patterns can inform hypotheses for subsequent causal research and may support development of clinical screening strategies and policy interventions aimed at prevention and early management of pediatric obesity.
Limitations
As an observational, retrospective EHR-based study, causation cannot be inferred from the identified associations and temporal patterns. Additional limitations inherent to EHR data (e.g., reliance on recorded diagnoses and measurement timing) may influence observed patterns, though specific details beyond non-causality were not elaborated in the provided text.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny