
Economics
Unraveling the association between socioeconomic diversity and consumer price index in a tourism country
Y. Leng, N. A. Babwany, et al.
Dive into groundbreaking research by Yan Leng, Nakash Ali Babwany, and Alex Pentland, revealing a strong association between diversity measures from mobile phone data and the Consumer Price Index in Andorra. This study paves the way for creating detailed CPI maps that enhance our understanding of economic trends and diversity's impact.
~3 min • Beginner • English
Introduction
The study asks whether micro-level socio-demographic diversity—measured from tourists’ behaviors captured via mobile phone records—predicts macro-level economic indicators, specifically the Consumer Price Index (CPI), in a tourism-focused country. The context is Andorra, whose economy relies heavily on international tourism. Prior work suggests diversity can influence productivity, innovation, and economic outcomes, yet the link between micro-level diversity and CPI remains unquantified due to data limitations. The purpose is to quantify associations between diversity of nationality and income (proxied via phone prices) around different types of points of interest (POIs) and both general and sectoral CPIs. The importance lies in enabling higher-frequency, spatially resolved monitoring of inflation-related indicators to complement official statistics and support timely policymaking.
Literature Review
The paper situates its contribution within multiple literatures: (1) diversity and economic performance — studies show both positive (e.g., innovation, productivity) and negative (e.g., resource allocation frictions) effects of ethnic and cultural diversity on economic growth; diversity can deflate financial bubbles and benefit corporate outcomes. Effects on housing prices are mixed. (2) Macro nowcasting — prior CPI/inflation forecasting used financial market data and factor models; traditional statistics face lags and sampling limitations. (3) Big data for socioeconomic measurement — satellite plus surveys predicted poverty; mobile phone data and environmental data predicted multidimensional poverty; the Billion Prices Project constructed daily CPI from online prices. (4) Mobile phone data applications — high penetration and coverage enable studies in epidemiology, mobility, tourism analytics, and COVID-19 contact tracing. This study extends these strands by linking tourist-based diversity measures to CPI in a tourism country and exploring the potential for high-frequency, spatially resolved CPI proxies.
Methodology
Setting and data: Andorra, a European tourism country (population ~85,000; ~10.2 million annual international visitors). Call detail records (CDRs) from the sole national carrier cover 100% of devices connecting in Andorra from July 2014 to August 2016. CDRs include timestamps, serving cell tower location (lat/lon), SIM country of registration, and handset attributes (brand, vendor, model, OS). Cell coverage is approximated using Voronoi tessellation. There are ~100 cell towers, each covering ~250 m to 2 km. Towers were manually labeled with up to eight POI categories: wellness, leisure, shop, gastronomic (food), nature, event, culture, and others. All predictors are z-normalized for comparability.
Diversity measures: Diversity is measured via Shannon entropy (balance dimension) using two categorical attributes: (a) nationality based on SIM registration, and (b) income proxy via handset price bins. Diversity is computed per tower i per day r and then averaged by POI category.
- Nationality diversity at tower i on day r: D_nat,i,r = (1/log(K+)) * ( - Σ_k (T_{i,k,r}/T_{i,r}) * log(T_{i,k,r}/T_{i,r}) ), where T_{i,r} is the total individuals at tower i on day r; T_{i,k,r} is count from nationality k; K+ is the count of nationalities present that day; K (universe) = 10 (Andorra, Spain, France, Netherlands, Belgium, Russia, UK, Germany, Portugal, others).
- Income diversity at tower i on day r: D_inc,i,r = (1/log(S+)) * ( - Σ_s (T_{i,s,r}/T_{i,r}) * log(T_{i,s,r}/T_{i,r}) ), where handset price categories S = 14 bins in USD: [0–20], [20–30], [40–50], [50–100], [100–150], [150–200], [250–300], [300–400], [400–500], [500–600], [600–700], [700–800], [800–900], >900; S+ is the number of bins present that day; T_{i,s,r} counts users in bin s.
- Aggregation by POI b: average nationality diversity Div_nat,b,r = (1/|C(b)|) Σ_{j∈C(b)} D_nat,j,r and similarly Div_inc,b,r for income, where C(b) is the set of towers with POI b. Towers may belong to multiple POIs.
CPI data: Monthly CPIs from Andorra Government Statistics, reported relative to 2001. Sectoral CPIs are grouped as: tourism-related (1) hotels, cafes, restaurants; (2) food, drink, tobacco; and resident-related (1) transport; (2) clothes and shoes; (3) residence-related services (rental, utilities, conservation products/services, water/sewer, electricity, gas/flammables); (4) public and social security administration; (5) furniture, products and services for home; (6) health. The analysis focuses on relative month-over-month change: ΔCPI_t = (CPI_{t+1} − CPI_t)/CPI_t.
Analytical approach:
- Compute Pearson correlations between diversity measures (country-level and POI-specific for nationality and income) and general/sectoral CPIs.
- Use elastic net regression to select a parsimonious subset of diversity covariates for each CPI category and to estimate predictive performance (R^2). Hyperparameters λ (penalty) and α (mixing between Ridge α=0 and Lasso α=1) are tuned over λ ∈ [10^-3, 10^3], α ∈ [0,1]. Reported optimal (λ, α): general (0.419, 0.105), hotels & restaurants (0.296, 0.421), food, drinks & tobacco (0.470, 0), clothes & shoes (0.944, 0), transport (0.117, 0.895), residence-related services (0.117, 0.053), furniture/home products & services (1.501, 0.105), health (0.052, 0.053).
- Nowcasting: Using selected covariates, produce daily nowcasts of CPI measures and spatial maps of predicted CPI at the cell tower level. Perform community detection via spectral clustering on pairwise correlations of tower-level predicted CPIs to identify regional groupings.
Preprocessing and visualization includes plotting correlations among sectoral CPIs and scatter plots of top diversity–CPI associations.
Key Findings
- Strong association between income diversity and general CPI: diversity of income at leisure POIs (r = 0.805, p < 0.001) and at nature POIs (r = 0.775, p < 0.001) correlates highly with general CPI.
- Country-level nationality diversity correlates negatively with several CPIs: general (r = −0.650, p < 0.001), food/drinks/tobacco (r = −0.635, p < 0.001), clothes/shoes (r = −0.765, p < 0.001); and positively with hotels/restaurants (r = 0.751, p < 0.001).
- Country-level income diversity correlates positively with: general CPI (r = 0.743, p < 0.001), transport (r = 0.682, p < 0.001), and residence-related services (r = 0.650, p < 0.001). Similar positive correlations are seen for furniture/home products/services (e.g., r ≈ 0.395, p < 0.05).
- POI-specific nationality diversity measures are strongly predictive of hotels/restaurants CPI (e.g., diversity at shops r ≈ 0.756 with hotels/restaurants CPI; multiple nationality diversity measures at leisure, nature, and other POIs show strong positive correlations with hotels/restaurants; nationality diversity at country, shopping, food, and culture POIs negatively correlates with clothes/shoes CPI: r ≈ −0.750 to −0.785).
- Opposing roles of diversity types: Income diversity generally correlates positively with CPIs (inflationary), while nationality diversity often correlates negatively (deflationary) for several sectors, with the notable exception of hotels/restaurants where it is positive.
- Elastic net predictive performance (R^2): General 0.74; Hotels & Restaurants 0.57; Food & Drinks 0.61; Clothes & Shoes 0.71; Transport 0.56; Residence & Services 0.78; Furniture & Home Products 0.28; Health Related Services 0.74.
- Parsimony: A small set of covariates suffices for certain sectors (e.g., four covariates for transport and for hotels/restaurants), whereas broader sets improve predictions for clothes/shoes and food/drinks/tobacco.
- Nowcasting: Daily CPI nowcasts capture periodic patterns (e.g., strong seasonality in clothes/shoes) and trends (e.g., health services increasing in 2015). Spatial maps reveal regional heterogeneity in predicted CPI across cell towers and cluster structures not solely explained by geographic proximity.
Discussion
Findings demonstrate that micro-level socio-demographic diversity derived from mobile phone data is strongly associated with macroeconomic indicators like CPI at both general and sectoral levels in a tourism economy. Income diversity, especially around leisure and nature POIs, aligns with higher general CPI, while nationality diversity often aligns with lower CPI for several resident-related sectors but higher CPI for tourism-centric sectors (hotels/restaurants). These relationships enable high-frequency nowcasting and spatially resolved CPI estimation, offering policymakers timely and granular insights beyond national averages. Potential mechanisms include economic vibrancy and demand from diverse tourists, increased service provision to meet heterogeneous preferences, expanded international marketing, and social learning among tourists that broadens consumption. Although causality is not established, the consistent associations across sectors and robust predictive performance suggest diversity as a valuable proxy for inflation dynamics in tourism contexts. The approach illustrates how passively collected behavioral data can complement lagged official statistics for more responsive economic monitoring and regional policy design.
Conclusion
This work provides the first empirical case linking tourist-based socio-demographic diversity measures from mobile phone data to general and sectoral CPIs in a tourism country (Andorra). It contributes by: (1) defining and computing nationality and income diversity at fine spatiotemporal scales; (2) establishing strong, sector-specific associations with CPI; (3) building elastic net models that achieve solid predictive performance; and (4) producing daily nowcasts and spatial CPI maps at cell tower resolution to complement official statistics. Future research should examine external validity in other tourism and non-tourism countries, refine theoretical underpinnings for the observed associations, improve income proxies beyond handset prices, and develop causal identification strategies to inform policy interventions.
Limitations
- Causality is not established; the analysis is associative using observational data.
- Single-country case (Andorra) with a tourism-heavy economy limits generalizability; external validity to other contexts remains to be tested.
- Income is proxied by handset price categories, which may imperfectly reflect disposable income.
- Diversity is captured via Shannon entropy (balance) and does not incorporate variety or disparity dimensions.
- Data sharing constraints prevent public release of underlying CDRs; replication relies on similar partnerships or alternative datasets.
- POI labeling is manual and towers can belong to multiple POIs, potentially introducing classification ambiguity.
Related Publications
Explore these studies to deepen your understanding of the subject.