Social Work
Contrasting social and non-social sources of predictability in human mobility
Z. Chen, S. Kelty, et al.
The study investigates how much an individual's (ego's) future mobility can be predicted from two sources: their social ties and non-social colocators (people who visit the same places around the same times but are not socially connected). Building on evidence that human mobility exhibits regularities and that social structures influence movement, the research asks whether, and to what extent, social networks and aggregates of non-social colocators can substitute for an ego’s own mobility history in predicting future locations. This question is motivated by applications such as urban planning and epidemic control, and by privacy concerns arising from information flow through social and non-social channels.
Prior work shows human mobility has regularities such as bursty activity, preference for a few locations, and decreasing exploration over time. Predictability bounds of future locations from an individual's own history typically range from 70–90% depending on granularity. Social relationships account for an estimated 10–30% of human movement. Online social activity studies indicate that up to ~95% of an individual's potential predictive accuracy may be encoded in their social network, and movement patterns in virtual and physical spaces are similar. Location-based social networks (LBSNs) and mobile phone call detail records (CDRs) provide sequences of locations and, for LBSNs, social network data, enabling analyses of social and non-social mobility information. These strands suggest that both social ties and non-social co-visitation patterns could convey predictive information about an individual's mobility.
Data: Three public LBSN datasets and one private CDR dataset were used. BrightKite: 4,491,143 check-ins by 58,228 users (Apr 2008–Oct 2010). Weeplaces (Foursquare): 7,658,368 check-ins by 15,799 users (Nov 2003–Jun 2011). Gowalla: 6,442,890 check-ins by 196,591 users (Feb 2009–Oct 2010). CDR (Rio de Janeiro Metropolitan Area): 22,116,252 call records by 35,338 users across 1,835 antennas (Jan–Jun 2014). After filtering inactive or incomplete records and users with fewer than 150 events, retained data were: BrightKite 510,308 events/6,132 users; Weeplaces 924,666/11,533; Gowalla 850,094/9,937; CDR 1,382,626/4,415.
Network construction: For LBSNs, the explicit social graph was restricted to users with trajectories. Non-social colocation networks were built by linking ego-alter pairs that checked in at the same location within a temporal window (default 1 hour). Alters were ranked by frequency of colocation; only alters providing better-than-random information and non-spurious colocations were retained. For CDRs, social ties were inferred as reciprocal caller pairs with at least 30 reciprocal calls over the study window (≈weekly); non-social colocators were required to have zero calls with the ego; colocation definition matched LBSNs but with antenna locations. Analyses focusing on aggregate predictability used egos having at least 10 alters in both social and colocation networks.
Information-theoretic measures: Individual entropy rate S_A for a trajectory A of length N is estimated via a Lempel–Ziv-based nonparametric estimator S_A = N log2 N / sum_i A_i, where A_i is the length of the shortest subsequence at position i not seen previously in A. Predictability Π_A (upper bound on accuracy of an ideal next-location predictor) is derived by inverting Fano's inequality: S_A ≤ H(Π_A) + (1−Π_A) log2(n−1), where n is the number of distinct locations.
Cross-entropy between ego A and alter B measures information about A contained in B’s past: S_{A|B} = N_A log2(N_B) / sum_i λ_i(A|B), where λ_i(A|B) is the shortest subsequence in A starting at i not seen in B prior to time t_i. Cross-predictability Π_{A|B} follows from applying Fano’s inequality to S_{A|B}.
Cumulative cross-entropy across a set of alters B adapts the cross-parsing to pick the longest match across any alter: S_{A|B} = N_A log2(N_{AB}) / Σ A(A|B), with A(A|B) = max_{B∈B} A(A|B) and N_{AB} the weighted average alter sequence length. Corresponding cumulative cross-predictability Π_{A|B} is computed via Fano’s inequality. Curves were evaluated as alters were accumulated in order of decreasing colocation rank, typically up to the top-10.
Spatial overlap metrics: Overlapped Distinct Location Ratio (ODLR) η_{AB} = |Y_A ∩ Y_B| / |Y_A| quantifies the fraction of an ego’s unique locations also visited by an alter. Cumulative ODLR (CODLR) across multiple alters uses the union of alter location sets: CODLR_{A|B} = |⋃_{i∈B} (Y_A ∩ Y_i)| / |Y_A|.
Temporal displacement analysis: Time-displaced colocation networks were built by linking ego-alter pairs visiting the same location within offset windows [T, T−1/2] hours before or [T−1/2, T] hours after the ego’s visit, for T in [0.5 h, 12 h] at 30-minute increments. Common egos with ≥10 alters across all T were analyzed (Weeplaces and CDR).
Statistical tests and ranking: Alters ranked by colocation frequency; correlations assessed via Pearson or Spearman as appropriate; paired one-sided t-tests compared social vs colocator predictability; saturation of cumulative predictability with increasing alters was modeled using nonlinear saturating fits to estimate asymptotic predictability I_∞.
Baseline individual predictability and entropy:
- Entropy S_A distributions peak around 4–5 bits for Gowalla and Weeplaces; ≈1 bit for BrightKite (many users visit 1–3 locations); ≈2 bits for CDR (bounded urban area, coarse spatial resolution). These translate to perplexities of about 16–32 likely next locations, a large reduction from a user’s total distinct locations.
- Predictability Π_A distributions: BrightKite has a spike near Π_A ≈ 1 for low-entropy users; Gowalla peaks ≈40% with wide spread; Weeplaces peaks ≈50%; CDR peaks ≈75%.
Pairwise and aggregate information from alters (Weeplaces examples):
- Cross-entropy/predictability for top alter: Top social tie S_{A|B} median 8.17 bits; top non-social colocator median 8.46 bits. Corresponding cross-predictability Π_{A|B} medians: social 17.43%; colocator 12.35%.
- Aggregating colocators increases information: Top-3 non-social colocators S_{A|B} median 8.02 bits; Π_{A|B} median 19.60%, exceeding the top social tie’s median predictability.
- Positive association between ego predictability and top alter predictability (Fig. S9).
Cumulative alters (up to top-10):
- Predictability increases as more alters are added (positive Spearman correlation across 88.94% of users, p<0.05). For a fixed number of alters, social ties outperform colocators on average (lower cross-entropy, higher cross-predictability). 94.47% of egos show significantly higher predictability from social ties than from colocators (paired one-sided t-test, p<0.01).
- Nevertheless, more colocators can match or exceed fewer social ties: In Weeplaces, top-3 colocators exceed top-1 social tie; top-7 colocators exceed top-2 social ties. Across datasets, fewer than 10 colocators can equal the information of the top social tie.
Extrapolated limits and replacement potential:
- Saturation extrapolation (Weeplaces): I_∞ ≈ 44.32% (social ties) and 39.79% (colocators). Relative to average ego self-predictability l_ego = 47.05%, this implies up to ~94% (social ties) and ~85% (colocators) of an ego’s potential predictability is available from alters alone. Including the ego’s own past raises predictability to ≈56.70% (with social ties) and ≈56.25% (with colocators), i.e., an additional ≈19.5–20.5% over l_ego.
Relative counts needed (predictability ratio analyses):
- Number of colocators equivalent to top social tie: BrightKite 1–2; Gowalla 7–8; Weeplaces 3–4; CDR 1–2. Thus, aggregates of colocators can substitute for fewer social ties, typically within 10 colocators.
Spatial overlap as a driver of information:
- ODLR declines monotonically with alter rank; trend stronger for social ties than colocators across datasets.
- CODLR increases with more alters and saturates around: social ties 30–40% and colocators 15–30% (LBSNs); for CDR, higher saturation (~80% social; ~65% colocators) due to coarse spatial resolution.
- Strong linear relationship between CODLR and cumulative cross-predictability: Pearson R≈0.66 (social) and 0.67 (colocators), p<0.001.
Temporal displacement robustness:
- Cross-predictability from non-social colocators shows little to no degradation across temporal offsets T from 0.5 h to 12 h for a fixed number of alters, with only slight decreases at longer offsets. Sets of colocators differ across lags, yet information content remains comparable, indicating that exact simultaneity of visits is not necessary to obtain predictive mobility information.
Privacy-relevant implication:
- Predictive information about an individual’s future movements is embedded not only in their social network but also in aggregates of non-social colocators, raising concerns that mobility data sharing can leak information about individuals who have not shared their data.
The findings demonstrate that an individual’s mobility can be substantially predicted from the mobility histories of others. Social ties generally carry more predictive information per alter than non-social colocators, confirming the role of social structure in shaping movement. However, the aggregate of a modest number of colocators can match or exceed the information provided by a few social ties, and extrapolations show that up to ~85% of an ego’s potential predictability resides in non-social colocators alone. The strong link between distinct-location overlap and cross-predictability indicates that shared place repertoires are a primary mechanism for information transfer, and temporal-displacement analyses reveal that simultaneity is not required—people visiting the same places at different times can still provide comparable predictive information.
These results address the core question by showing that both social and non-social sources can bound and approximate an ego’s predictability without access to the ego’s own history. This has practical relevance for mobility modeling, urban planning, and public health (e.g., contact tracing), particularly when social or individual-level data are missing. It also heightens privacy concerns: mobility patterns of a few users can reveal substantial information about others, suggesting that data governance must consider collective privacy risks and not solely individual consent.
This work introduces a colocation-based framework to disentangle social from non-social sources of predictive information in human mobility and applies nonparametric entropy and cross-entropy estimators across diverse datasets (three LBSNs and CDRs). It shows that while social ties are individually more informative, aggregates of non-social colocators can approach or match the predictive power of social ties, with extrapolated bounds indicating up to ~94% (social) and ~85% (non-social) of ego predictability recoverable from alters. Spatial overlap of unique locations drives information transfer, and predictive power is robust to temporal displacement of visits.
Future research directions include: integrating socioeconomic and demographic variables to understand drivers of colocator-based predictability; leveraging richer, higher-resolution and more representative datasets; experimental designs to validate causality and mechanisms; and developing privacy-preserving algorithms and policies that address collective inference risks from mobility data.
The study relies on observational datasets with inherent biases: LBSN users are not representative of the general population; check-ins under-sample true mobility; reported social networks are incomplete and may not reflect offline ties; CDR data have coarse spatial resolution (cell-tower areas) and are bounded to one metropolitan region. Social ties in CDRs are inferred via reciprocal call thresholds, which may miss or misclassify relationships. Filtering (e.g., N≥150 events) may exclude less active users. These factors can affect generalizability and the absolute values of entropy and predictability, though cross-dataset robustness checks mitigate concerns.
Related Publications
Explore these studies to deepen your understanding of the subject.

