Transportation
A new model for residential location choice using residential trajectory data
Y. Cui, P. Zhao, et al.
Explore a groundbreaking study that unveils a new residential location choice model, revealing how commuting time and home-based non-commuting time can significantly influence urban living. Conducted by Yanzhe Cui, Pengjun Zhao, and their team, this research offers insightful implications for urban planning in rapidly changing environments.
~3 min • Beginner • English
Introduction
More than 55% of the world’s population lives in cities, and residential location choices (RLCs) have substantial implications for urban sustainability, influencing local economies, spatial structures, environments, transport systems, and epidemic control. At the individual level, residential satisfaction contributes to overall life satisfaction. Existing RLC models are commonly of two types: (1) urban system models (e.g., MUSSA II, RELU-TRAN, UrbanSim) that integrate land, labour, industry, and transport interactions, focusing on interdependencies rather than the core nature of location choice; and (2) discrete choice models (often Multinomial Logit) that examine how individual and location characteristics (e.g., age, gender, household size, access to infrastructure) affect choice. However, the high dimensionality of household and spatial attributes complicates calibration. Travel behaviours are pivotal for RLCs: people prefer neighbourhoods enabling satisfying trips, and low travel satisfaction can prompt relocation to neighbourhoods aligned with preferred modes. Thus, RLCs depend not only on amenities but also on personal preferences. No existing RLC model explicitly incorporates individual travel behaviour. This study addresses that gap by constructing an RLC model grounded in home-based travel time allocation (commuting vs. home-based non-commuting, HBNC), thereby reducing data needs and simplifying structure. Using large-scale mobile phone trajectory data for Beijing and Shenzhen (2018–2020), we identify homes, workplaces, and other stays to quantify commuting and HBNC times. We emphasise revealed preferences over stated preferences, extend RLC models to include individual preferences proxied by home-based travel, and demonstrate applications for group-level spatial distribution and individual-level choices, including dynamic change assessment and prediction.
Literature Review
The literature distinguishes two broad RLC modeling traditions. Urban integrated models (e.g., MUSSA II, RELU-TRAN, UrbanSim) analyse RLC through interactions among land markets, labour markets, industrial distribution, and transport, prioritising sub-module interdependencies. Discrete choice approaches (e.g., MNL) evaluate how socio-demographics and spatial attributes (age, gender, household composition, accessibility, etc.) shape residential choices. Prior work notes that travel behaviours and satisfaction affect residential preferences and relocation, with potential self-selection (individuals choosing neighbourhoods suiting preferred travel modes). Despite extensive work on amenities and demographics, individual preferences as expressed in revealed travel behaviour are rarely integrated into RLC models. This study leverages home-based travel time, especially HBNC time, as a proxy for preferences and built environment consumption, aligning with evidence that residents minimise mandatory commuting costs and maximise utility-bearing optional travel.
Methodology
Analytical framework: The study develops an RLC model from both population and individual perspectives and shows they converge to the same form.
Population-level (gravity) formulation: A constrained gravity model balances attraction (benefits) and deterrence (costs). The model considers location attractiveness m_j and deterrence f(r_ij), with commuting cost as deterrence and housing expenditure as a budget constraint. The novelty is to proxy attractiveness using HBNC time, reflecting revealed preferences and built environment consumption.
- Attractiveness: m_i = exp[a log(hc_i) + γ HBNC_time_i]
- Deterrence: f(r) = exp(−β C_time_{ij})
- Commuting time: C_time_{ij} = (Time_{ij} + Time_{ji}) / (N_i + N_j)
- HBNC time: HBNC_time_i = (Time_{is} + Time_{si}) / (N_i + N_s)
Here, i denotes residential tile, j workplace tile, s non-work site; Time terms are total travel times between respective origins/destinations over a month; N terms are corresponding trip counts. Substituting yields the estimable RLC form:
T_{ij} = O_j Prob_{ij} = O_j Σ_k exp[a log(hc_i) + γ HBNC_time_i + β C_time_{ij}], where T_{ij} is the number of residents who work in j and live in i, O_j is total workers in j, and Prob_{ij} is the probability of choosing i given j. Time variables enter exponentially; other variables in power-law form.
Individual-level (utility maximisation): For a risk-neutral individual o working in j and residing in i, utility depends on commuting time as an iceberg cost and HBNC time allocation between consumption (α C_i) and travel-related utility (λ_i), with λ_i drawn from an extreme value distribution:
U_{io} = [exp(−β C_time_{ij})/Σ_k exp(−β C_time_{kj})] × (α C_i/λ_i) × [1/(1−exp(−λ_i))]
Subject to α C_i + λ_i = μ_{io} HBNC_time_i, with F(λ_i) = exp(−λ_i). At equilibrium, aggregating over individuals yields the same choice probability structure as the gravity model.
Study area: Beijing and Shenzhen, large Chinese megacities with distinct geographies and urban structures (Beijing more monocentric; Shenzhen polycentric and geographically constrained). Population during 2018–2020: Beijing ~21–22 million; Shenzhen ~16.66–17.63 million.
Data and processing:
- Mobile phone signalling data: Trajectories from >12 million regular users in Beijing and >4 million in Shenzhen for November 2018, 2019, and 2020. Regular users appear >10 days per month. Operator computes user positions via multi-base-station weighting; stays >30 min considered stay points. Residence: longest stay 20:00–05:00; workplace: longest stay weekdays 05:00–20:00. Trips identified between stay points; purposes classified as commuting or HBNC. Extreme values truncated (commuting >180 min, HBNC >300 min). Data aggregated to spatial tiles; only tiles with ≥5 identified residents used; monthly averages computed per residential tile for commuting time and HBNC time.
- Housing expenditure: Listing prices scraped from public websites and government guideline prices. To mitigate sparse listings, averaged at neighbourhood (jiedao) level and assigned to tiles by tile-centre’s jiedao.
- Control variables (POI): Distances from tile centre to nearest subway, bus station, hospital, retail market, park, and school derived from OpenStreetMap.
- Instrumental variables: Monthly precipitation per tile; percentages by gender and age per tile used for endogeneity checks.
Empirical implementation:
- EVT hypothesis testing: Weighted fits of commuting and HBNC time distributions using Generalized Extreme Value (GEV) distribution to verify alignment with EVT.
- Model estimation: Generalized Linear Models (GLM) to estimate the relationship between the probability of a tile being chosen as a residential location (conditional on workplace) and key predictors: commuting time, HBNC time, and housing price; robustness checks include adding control variables (amenities), instrumental-variable estimation (2SLS) for endogeneity, spatial scale tests (tiles of 250 m, 500 m, 1000 m, 2000 m), and inclusion of time-lagged dependent variable (Prob. t−1) to capture historical dependence.
- Applications: (1) External shock analysis by estimating pre-pandemic (2018–2019) vs post-pandemic (2020) parameters and examining RAV = |coeff(commuting time)/coeff(HBNC time)|; (2) Prediction: Train on 2019, predict 2020 RLC distributions across scales and assess via probability P–P plots and rank comparisons.
Key Findings
- EVT alignment: Both commuting time and HBNC time distributions fit GEV, indicating residents minimise mandatory commuting time and maximise optional HBNC time under time and budget constraints.
- Core regression results (Table 3):
- Shenzhen: commuting time coefficient −0.0316 (t ≈ −35.83, p<0.001); HBNC time 0.0058 (t ≈ 6.52, p<0.001); housing price −1.7164 (t ≈ −46.30, p<0.001); R²=0.1592; N=21,221.
- Beijing: commuting time −0.0137 (t ≈ −33.90, p<0.001); HBNC time 0.0050 (t ≈ 9.42, p<0.001); housing price −0.7952 (t ≈ −48.60, p<0.001); R²=0.0581; N=56,516. Control variables (bus stations, hospitals, retail markets, parks, schools) generally positive and significant in Beijing.
- Relative importance (Wald test, Table 4): Commuting time coefficient magnitude significantly exceeds HBNC time in both cities (Beijing 0.0087, Shenzhen 0.0258; p<0.001), confirming stronger impact of commuting on RLC.
- Robustness:
- Adding amenities increased R² by ~2% (Shenzhen) and ~11% (Beijing), with core coefficients’ signs and significance unchanged; HBNC time acts as a proxy for amenity access and preferences.
- Endogeneity: IV (precipitation, age share, gender share) 2SLS models pass weak identification tests; results robust (Supplementary Table 3).
- Scale effects: Across 250 m, 500 m, 1000 m, and 2000 m tiles, commuting time, HBNC time, and housing price remain significant (1% level) with consistent signs; RAV>1 across scales, indicating commuting time’s greater influence.
- Time-lagged terms (Table 5): Including Prob. t−1 significantly improves fit: Shenzhen R²=0.2965; Beijing R²≈0.1469–0.1508; lag coefficient ≈0.28–0.285 (p<0.001). Core variable signs and significance persist.
- External shock (COVID-19): Signs and significance of commuting and HBNC time unchanged post-pandemic, but RAV increases, indicating commuting time became relatively more influential compared to HBNC time after COVID-19.
- Prediction: Models trained on 2019 positively correlate with 2020 observed distributions across spatial scales. Predicted ranks correlate positively with observed ranks, supporting predictive usefulness.
- Descriptive statistics (Table 2): Average commuting times (min): Shenzhen ~20.2–24.8 pre/post; Beijing ~31.6–33.5; HBNC means (min): Shenzhen ~23.8–28.6; Beijing ~26.1–28.9. Housing price means (yuan): Shenzhen ~55,359–63,234; Beijing ~62,660–63,323.
Discussion
The study addresses the gap in RLC modeling by directly incorporating individual residential preferences via revealed home-based travel behaviour. Empirical evidence shows residents minimise commuting time (mandatory cost) and seek to maximise HBNC time (utility from optional travel), validating the theoretical assumptions and linking them to an EVT-based heterogeneity structure. The gravity-style population model and the individual utility-maximisation model converge to the same probability form, providing a unified framework at both aggregate and individual levels. Across two structurally different megacities and multiple spatial scales, commuting time consistently exerts a stronger effect than HBNC time, aligning with the notion that mandatory activities dominate residential trade-offs. HBNC time, serving as a proxy for amenity consumption and personal preferences, retains explanatory power even after explicit amenity controls, suggesting it captures both accessibility and preference heterogeneity. Robustness to endogeneity and historical dependence (time-lag) strengthens causal interpretation, and the model detects behaviour shifts under external shocks (COVID-19) by revealing increased relative weight of commuting cost, likely due to risk considerations reducing discretionary travel. The predictive analyses demonstrate practical applicability for forecasting residential distributions from prior travel patterns.
Conclusion
This paper proposes a new RLC model that integrates revealed home-based travel behaviour (commuting and HBNC time) with housing expenditure, unifying a gravity-based population perspective and an individual utility-maximisation perspective. Using large-scale mobile trajectory data from Beijing and Shenzhen (2018–2020), the study confirms residents minimise commuting time and maximise HBNC time, and shows the model is robust to control variables, spatial scale, endogeneity, and historical dependence. The model generalises across distinct urban forms, captures external shock impacts (COVID-19), and exhibits promising predictive performance. Contributions include: (1) explicitly embedding individual preferences via HBNC time, reducing reliance on extensive amenity inventories; (2) simplifying model structure while retaining behavioural richness; and (3) enabling analyses at both group and individual levels for planning and policy. Future research should extend variables beyond travel-related factors (e.g., noise, air quality), improve data fidelity through access to original trajectory records where possible, refine distinctions within optional travel types, and better measure out-of-home leisure to enhance explanatory and predictive power.
Limitations
The study acknowledges several limitations: (1) Omitted variables unrelated to travel (e.g., noise, air quality) may influence RLC but are not explicitly modelled. (2) Use of secondary, operator-processed trajectory data (not original call detail records) prevents independent quality verification, potentially affecting measurement accuracy. (3) The binary classification of travel as mandatory vs optional may oversimplify behaviour, reducing the precision of HBNC time as a proxy for amenities and preferences. (4) HBNC time may be underestimated due to unobserved or unlinked non-home leisure trips and co-occurrences of non-work site visits. (5) Some spatial assignment of housing prices relies on averaging within administrative units, which may mask within-area heterogeneity.
Related Publications
Explore these studies to deepen your understanding of the subject.

