logo
ResearchBunny Logo
Identification of temporal condition patterns associated with pediatric obesity incidence using sequence mining and big data

Medicine and Health

Identification of temporal condition patterns associated with pediatric obesity incidence using sequence mining and big data

E. A. Campbell, T. Qian, et al.

This groundbreaking study conducted by Elizabeth A. Campbell, Ting Qian, Jeffrey M. Miller, Ellen J. Bass, and Aaron J. Masino explores crucial temporal patterns linked to pediatric obesity using EHR data. It uncovers strong associations between pre-obesity conditions like asthma and allergic rhinitis, suggesting these could be early indicators of obesity.

00:00
00:00
Playback language: English
Introduction
Childhood obesity is a significant public health concern in the United States, with approximately 35% of children and adolescents being overweight or obese in 2016. This poses increased risks for various comorbidities. Electronic health records (EHRs) offer a valuable resource for studying childhood obesity, allowing for large-scale population studies and analysis of associated conditions. While EHRs have been used to study obesity prevalence and comorbidities, research on the temporal dependencies of conditions preceding obesity incidence is limited. This study aimed to leverage EHR data to identify temporally ordered condition patterns surrounding childhood obesity incidence. The research objective was to identify sequences of conditions recorded in EHRs before, during, and after the first recorded obese BMI, using the SPADE sequence mining algorithm on a large retrospective cohort. The study aimed to identify common temporal condition sequences associated with pediatric obesity incidence and determine whether these patterns occur with statistically significantly different prevalence in patients with obesity compared to matched controls with healthy BMIs.
Literature Review
Existing literature highlights the significant public health burden of childhood obesity and its associated comorbidities. Previous research has utilized EHR data for obesity diagnosis, quality improvement, and prevalence estimations, often in conjunction with other data sources to analyze environmental factors. However, a gap existed in understanding the temporal relationships between various conditions and the onset of obesity. This study directly addresses this gap by focusing on the temporal sequencing of conditions leading up to and following an obesity diagnosis.
Methodology
This retrospective, matched case-control study used data from the Pediatric Big Data (PBD) resource at the Children's Hospital of Philadelphia (CHOP). The study included 49,694 patients with pediatric obesity (cases) and a matched control group of patients with healthy BMIs. Cases were identified based on BMI z-scores at or above the 95th percentile for age and sex. Controls were matched by sex, number of prior visits, and age at the index visit (within 60 days). The SPADE algorithm was employed to identify frequent temporal condition patterns in the case population, with a support level of 0.01. McNemar's test was used to compare pattern prevalence between cases and controls. Data included clinical observations from pre-index, index, and post-index visits. The pre-index and post-index visits were not necessarily face-to-face encounters but had to occur within a defined timeframe. Preprocessing steps involved excluding records with biologically implausible values, incomplete data, and non-primary care visits. Data cleaning steps are illustrated in a detailed flow diagram (Fig. 1). The detailed matching process for control selection is also explained with a flow diagram (Fig. 1b) and visual comparison of the age and prior healthcare visits (Fig. 2). The SPADE algorithm is chosen for its efficiency in analyzing large, sparse datasets.
Key Findings
The SPADE analysis identified 163 condition patterns present in at least 1% of cases. Of these, 80 were significantly more common among cases, and 45 were significantly more common among controls (p < 0.05). Asthma and allergic rhinitis were strongly associated with childhood obesity incidence, particularly during pre-index and index visits. Seven conditions were exclusively diagnosed in cases during pre-index visits, including ear, nose, and throat disorders and gastroenteritis. The mean and standard deviation time difference between pre-index and index visits were 303.6 and 462.8 days, respectively, and the median difference was 125 days. The mean and standard deviation time difference between index and post-index visits were 147.8 and 246.3 days, respectively, and the median difference was 49 days. More than two-thirds of clinical observations recorded during pre- and post-index visits were made within 180 days of the index visit (n=129,095) and an additional 13.8% of observations were made between 180 and 365 days of the index visit (n=26,446); over 80% of observations from pre- and post-index visits were made within a year of the index visit. Demographic characteristics of the study population (Table 1) show that the population is majority male (55.3%), with a racial composition of 49.4% White, 34.7% Black or African-American, and 8.8% Hispanic. At the index visit, 57.1% used Private or Commercial insurance and 38.6% used Medicaid/CHIP. Age distribution at the index visit shows that 30.5% of patients were 2-4 years old, 42.9% were 5-11 years, and 26.6% were 12-18 years old.
Discussion
This study's novel application of the SPADE algorithm to a large retrospective EHR dataset revealed temporally dependent condition associations with obesity incidence. The strong association between allergic rhinitis and asthma with pre-index visits suggests that these conditions may be potential early indicators of obesity. The conditions exclusively observed during pre-index visits may also represent early warning signals. Although the study identifies temporal associations, it cannot infer causation. The findings support further research to explore the causal relationships between these conditions and childhood obesity. The large sample size and the use of a matched control group enhance the study's robustness.
Conclusion
This study demonstrates the utility of sequence mining techniques, specifically SPADE, in identifying temporal condition patterns related to pediatric obesity incidence using EHR data. Allergic rhinitis, asthma and several other conditions identified in the pre-index period are potential early indicators of obesity. Future studies should focus on investigating the causal relationships between these conditions and the development of obesity, which may offer valuable insights for early intervention and prevention strategies. Further work might involve exploring the use of other sequence mining algorithms or incorporating additional data sources to improve the accuracy and completeness of the analyses.
Limitations
The study's retrospective nature and reliance on EHR data limit the ability to definitively establish causal relationships. The data are limited to patients within the CHOP healthcare system, potentially affecting the generalizability of the findings to other populations. Information bias might be present due to the nature of EHR data collection practices. Furthermore, while a matched control group was used, residual confounding factors might still exist.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny