logo
Loading...
Harnessing household travel survey with smart card data to generate spatiotemporally-diverse activity schedules for transit users

Transportation

Harnessing household travel survey with smart card data to generate spatiotemporally-diverse activity schedules for transit users

K. D. Vo, E. Kim, et al.

A novel two-stage data fusion framework combines detailed household travel surveys with high-coverage smart-card data to generate complete, diverse, and scalable transit activity schedules—boosting unique synthetic schedules up to 2.92 million and closely matching cellular data in Seoul. Research conducted by Khoa D. Vo, Eui-Jin Kim, Huichang Lee, and Prateek Bansal.... show more
Abstract
Current activity-based models (ABMs) rely on household travel survey (HTS) data to generate daily activity schedules for transit users. However, HTS suffers from limited sampling, resulting in low spatiotemporal diversity. Smart card (SC) data offer broader transit coverage but lack sociodemographic, non-transit trips, and trip-level details, making integration with HTS challenging. This study introduces a novel two-stage data fusion framework that combines detailed but sparse HTS data with high-coverage SC data to generate complete, diverse, and up-to-date activity schedules for transit users. In Stage 1, the framework learns a latent class structure to align the spatiotemporal characteristics of transit trips across datasets and estimates a fused joint distribution over all attributes except the spatiotemporal details of non-transit trips. Stage 2 imputes these missing spatiotemporal details to complete full trip chains. A key innovation is the construction of a latent space with optimal complexity that preserves key statistical properties while enhancing the diversity of synthesized activity patterns. The framework ensures scalability by decomposing the fusion task into analytically tractable sub-problems. The model properties are first validated in a controlled experiment. Further validation using data from 3.4 million SC users in Seoul, South Korea, shows that the fused population closely aligns with external cellular signaling data and significantly outperforms HTS alone – generating up to 2.92 million unique synthetic schedules (an 82.8x increase over HTS). In sum, the proposed method lays the groundwork for integrating diverse data sources into ABMs, enhancing their ability to generate diverse synthetic mobility patterns, including underrepresented segments.
Publisher
Transportation Research Part B
Published On
Jan 06, 2026
Authors
Khoa D. Vo, Eui-Jin Kim, Huichang Lee, Prateek Bansal
Tags
activity-based models
data fusion
smart-card data
household travel survey
latent class modeling
synthetic activity schedules
urban mobility diversity
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 22+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny