logo
ResearchBunny Logo
Measuring Chinese mobility behaviour during COVID-19 using geotagged social media data

Sociology

Measuring Chinese mobility behaviour during COVID-19 using geotagged social media data

K. Zhu, Z. Cheng, et al.

This study by Kaixin Zhu, Zhifeng Cheng, and Jianghao Wang explores the profound effects of the COVID-19 pandemic on mobility behavior in China. Utilizing geotagged social media data from Weibo, the researchers reveal striking reductions in visits to workplaces and public venues, alongside a surge in time spent at home, particularly among younger and more educated individuals. Discover the implications of these findings for policymaking!... show more
Introduction

The study investigates how the COVID-19 pandemic altered intra-city human mobility in China across different amenities and socio-demographic groups. Human mobility, crucial for urban planning and public health, underwent substantial changes due to non-pharmaceutical interventions and stay-at-home policies worldwide. While prior research documented large-scale and inter-city mobility shifts in China, a lack of individual-level, publicly available mobility data hindered understanding of within-city behaviours and group heterogeneity. The paper aims to fill this gap by leveraging massive geotagged social media data to quantify changes in visits to six categories of places (Residential, Workplaces, Retail & recreation, Parks, Transit stations, Grocery & pharmacy), assess differences by age, gender, education, and marital status, and characterize adaptations in visiting diversity and travel distances. The work also evaluates the representativeness and reliability of social media-derived mobility indicators against census data and Baidu Qianxi metrics, providing insights relevant for public health policy, economic recovery, and addressing socio-demographic inequalities.

Literature Review

The introduction synthesizes evidence that COVID-19 interventions reshaped human movement networks, altered use of urban amenities, and produced heterogeneous behavioural responses across socioeconomic and demographic strata. Global datasets like Google Community Mobility Reports (GCMR) aided monitoring but exclude mainland China, where analyses often relied on city-aggregated indicators (e.g., Baidu Qianxi). Existing studies describe declines in cross-city movements, changes in network structure, and shifts in contact patterns, and highlight heterogeneity by partisanship, income, and demographics. However, fine-grained, individual-level data describing intra-city interactions with specific amenities and group disparities in China remained scarce. The study positions geotagged social media as a complementary data source to capture detailed behavioural changes and validate against established mobility measures.

Methodology

Data: Collected 210 million geotagged Weibo posts from 10 million users spanning 2019–2020. User profiles provided self-declared gender, age, education, and marital status. All data were anonymized and aggregated. POI categorization: Each post’s attached Point-of-Interest (POI) was mapped to six categories aligned with GCMR: Residential, Workplaces, Retail & recreation, Parks, Transit stations, Grocery & pharmacy. Weibo mobility index: Aggregated daily visits per category; applied a 7-day moving average to account for weekly periodicity. Computed percentage change relative to a day-of-week baseline defined as the median over the second half of 2019 (longer than GCMR’s baseline and excluding the Spring Festival) to reduce holiday-induced swings. Representativeness tests: Compared spatial distribution of users vs. resident population at province level using the China Population Census Yearbook 2020. Performed linear regression between provincial shares of users and population, and compared age-gender structures of users vs. national statistics. Cross-validation with Baidu Qianxi: Constructed a Weibo city movement intensity (CMI) index as the proportion of users visiting non-residential categories per city-day. Normalized both Weibo CMI and Baidu CMI, aggregated to provincial and national levels weighted by city residents, applied 7-day moving average, and computed percentage change relative to the pre-pandemic baseline (Jan 1–19, 2020). Calculated Pearson correlations (Jan–Apr 2020). Post-stratification correction: To mitigate sampling biases across gender/age groups, computed post-stratification weights per group g and region r as the square root of the ratio of census group proportion to user group proportion. Reweighted counts of users visiting place a on day i by group, recomputed provincial Weibo CMI, and reassessed Pearson correlations vs. Baidu CMI. Behavioural heterogeneity: Grouped users by gender, age (five brackets), education (high school vs. bachelor+), and marital status (single vs. married). For each group, computed percentage change in visits to residential and non-residential places during the month after Jan 20, 2020, relative to the 2019H2 baseline. Visiting diversity: For each city, computed percentage change in number of distinct POIs visited (residential and non-residential) over time relative to baseline. Mobility distances: Inferred home location per user as the most frequently visited residential POI. For each outing, computed haversine distance from home to destination POI. For each user and category, calculated median distance in baseline vs. outbreak period and compared distributions across distance bands: walkable (<1.5 km), nearby (1.5–3 km), far (3–15 km), distant (>15 km).

Key Findings
  • Within one month after Jan 20, 2020, visits declined sharply to non-residential places vs. 2019 baseline: Workplaces −45.5%, Retail & recreation −57.7%, Parks −38.7%, Transit stations −43.5%, Grocery & pharmacy −16.8%; Residential visits increased +10.7%.
  • Aggregated non-residential visits reached a trough of −55.0% on Feb 9, 2020, then gradually recovered by April; residential visits remained above baseline until September 2020. Visits to parks and transit stations did not return to baseline until end of 2020.
  • Representativeness and validation: Provincial user share vs. population share showed strong association (regression coefficient 0.897; P<0.001; R-squared=0.683). Age-gender biases exist (more females; underrepresentation of <14 and >60; overrepresentation of 15–24). Weibo CMI vs. Baidu CMI exhibited high temporal concordance: 21 provinces had Pearson r>0.8 (P<0.001). Post-stratification slightly improved r in 16 provinces (by 0.01–0.09) and minimally changed national r, indicating limited impact of sampling bias on cross-validation.
  • Group heterogeneity: Male vs. female residential visits increased +13.6% vs. +9.3%; non-residential decreased −50.9% vs. −60.4%. Non-residential reductions decreased with age: −64.6% (18–22) to −30.6% (36–40). Most age groups had residential increases (+27.4% to +39.2%), except 23–25 and 26–30. Education: bachelor+ had +12% residential change, while high school showed −4.2%. Marital status: singles experienced greater reductions in residential (−14.5%) and non-residential (−62.7%) visits than married individuals.
  • City-level changes: Median proportional change across cities (outbreak vs. baseline): Residential +48.0%; Grocery & pharmacy −4.5%; Workplaces −15.6%; Transit stations −33.3%; Parks −34.3%; Retail & recreation −52.0%. 81.1% of cities increased the proportion of residential visits; only 4.9% (retail & recreation), 9.8% (transit), and 15.2% (parks) saw increases.
  • Visiting diversity: Median number of distinct POIs declined in early 2020, bottoming in Feb (non-residential at 8.9% of baseline; residential at 4.3%), with residential diversity recovering by late 2020, while non-residential remained overall 3.4% below baseline in 2020.
  • Mobility distances: Majority of trips <15 km both periods. After the outbreak, more individuals chose nearby trips for certain categories: +3.0% within 3 km to residential places (excluding home), +4.3% to retail & recreation, +3.8% to parks; fewer far trips: −4.1% (residential), −4.5% (retail & recreation), −1.5% (parks).
Discussion

The findings demonstrate substantial, category-specific reductions in non-residential activity and increased residential presence immediately following China’s COVID-19 emergency response, with staggered recoveries across amenities. The strong alignment between Weibo-derived mobility indicators and Baidu Qianxi, and the limited changes after post-stratification, support the utility of geotagged social media to monitor intra-city mobility where other individual-level data are scarce. The study reveals pronounced heterogeneity: younger, higher-educated, and unmarried individuals reduced outings more, highlighting asymmetric social impacts and potential exacerbation of inequalities. City-level analyses show consistent increases in residential activity and widespread declines in retail/recreation and transit use, while visiting diversity contracted, particularly for non-residential POIs. Shorter travel distances to parks and retail/recreation suggest adaptation to restrictions by substituting nearby amenities. Collectively, these results address the core research questions by quantifying intra-city mobility changes across amenities and groups, validating representativeness, and elucidating behavioural adaptations relevant for policy targeting public health, economic recovery, and equity.

Conclusion

This study proposes and validates a framework that uses large-scale geotagged social media data to quantify intra-city mobility changes across amenities and socio-demographic groups in China during COVID-19. It provides detailed estimates of declines across non-residential categories and increases in residential visits, documents heterogeneous responses by age, gender, education, and marital status, and characterizes adaptations in visiting diversity and travel distances toward nearby amenities. The approach complements existing city-level datasets and fills gaps left by GCMR’s lack of coverage in mainland China. Future research should extend analyses with long-term data (e.g., 2019–2023) to track post-pandemic recovery trajectories, and integrate mobility with public health, policy, environmental, and socioeconomic factors to assess correlations or causal effects, guiding resilience-building and preparedness for future emergencies.

Limitations
  • Sampling bias: Social media users who opt to geotag posts are not a random sample; observed biases by age and gender remain despite post-stratification, and unobserved biases may persist.
  • Spatial-temporal sparsity: Geotagged posts are unevenly distributed across space and time, potentially affecting stability of estimates, especially around holidays and in low-activity periods.
  • Underrepresentation of less-populated areas: Some cities/regions have insufficient check-ins, which may bias national-level inferences.
  • Data scope: The dataset reflects user check-ins rather than continuous trajectories; complementary data (e.g., mobile phone positioning or cellular signaling) could improve coverage and robustness.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny