logo
ResearchBunny Logo
Introduction
The first UN Sustainable Development Goal (SDG) aims to eradicate poverty. Accurate, regularly updated poverty maps are crucial for effective poverty alleviation strategies in low- and middle-income countries (LMICs). Traditional methods relying on census and household survey data (e.g., small area estimation or SAE) are often limited by infrequent updates (typically every 10 years) and coarse spatial resolution. Furthermore, data availability and reliability can be problematic in many LMICs. This research explores the use of call detail records (CDRs) from mobile phone networks as an alternative data source. CDRs offer high spatial and temporal resolution, passively collected data on user location, social networks, and call behavior, all of which are correlated with socioeconomic status. While CDR data have limitations, such as biases towards certain demographics (educated, male, urban, wealthier individuals) and incomplete coverage, studies have shown their usefulness in inferring socioeconomic status. This study aims to determine if CDR-derived features can be generalized across multiple settings to predict poverty and wealth, promoting broader use of mobile data for poverty estimation and complementing existing methods. Success in this endeavor would reduce the cost and time associated with socioeconomic data collection, improving the frequency and precision of poverty estimations crucial for meeting the SDG 2030 agenda.
Literature Review
Existing research has demonstrated the potential of CDR data to infer socioeconomic status within individual countries. Studies have used various CDR-derived features, such as call frequency, airtime recharge patterns, and mobility patterns, to create poverty maps. However, the generalizability of these findings across different contexts has not been established. This study builds on prior work by aiming to identify a common set of CDR features that are predictive of poverty across diverse geographical settings. The study also acknowledges the inherent limitations and biases in CDR data, including self-selection bias (where the poorest individuals may not own mobile phones) and coverage limitations, particularly in rural areas. Existing literature highlights the challenges associated with data access, privacy concerns, and the need for transparent and verifiable methods to enable the broader use of CDRs for development purposes. This study contributes to addressing this gap by evaluating the generalizability of CDR-based poverty estimation across multiple countries.
Methodology
This study uses data from three LMICs: Namibia, Nepal, and Bangladesh. CDR data were obtained from the leading mobile network operators (MNOs) in each country: MTC (Namibia), Ncell (Nepal), and Grameenphone (Bangladesh). The study used Demographic and Health Survey (DHS) data to provide ground truth on socioeconomic status, specifically using the DHS wealth index as a measure of poverty. The spatial scale of the analysis was based on Voronoi tessellations approximating the coverage area of each mobile phone tower. Aggregate CDR features were calculated for each Voronoi polygon, representing the mean, sum, or mode of the corresponding data. DHS data were matched to Voronoi polygons based on the centroid of each DHS cluster. The CDR-derived features included measures of mobility (number of places visited, entropy of places, radius of gyration), call patterns (outgoing/incoming call and text counts, percentage of nocturnal calls, call/text durations), and social network features (interactions per contact, entropy of contacts). In Bangladesh, additional data on revenue and consumption were available (top-up amounts and frequency). A key aspect of the methodology involved addressing multicollinearity among the CDR features, using both Pearson correlation and variance inflation factor (VIF) to select non-collinear variables. Hierarchical Bayesian areal models using Integrated Nested Laplace Approximations (INLA) were employed to model the relationship between poverty (DHS wealth index) and the CDR features. The models accounted for spatial autocorrelation using a Besag model. Models were built using 70% of the data, with the remaining 30% used for out-of-sample cross-validation to evaluate model performance (using R-squared and RMSE). Two sets of models were developed for each country: a ‘full model’ using all available non-collinear CDR features, and a ‘generalized model’ using only the features common to all three countries. Finally, the total number of people predicted to be in poverty was calculated using WorldPop population data, to compare the spatial distributions of poverty estimated by both the full and generalized models. The DHS classifies the lowest two quintiles as poor (poorest and poorer); this was used to estimate the overall number of people in poverty.
Key Findings
The study found that five common CDR-derived features (number of unique towers visited, outgoing call count, percent nocturnal communications, radius of gyration, and entropy of places) explained 50-65% of the variance in socioeconomic status across all three countries. Model performance, as measured by R-squared and RMSE, was similar between the full and generalized models in Namibia and Nepal, indicating the generalizability of these five features. However, in Bangladesh, the full model, which included additional country-specific features (text counts, top-up data, multimedia messaging, and internet usage), performed significantly better than the generalized model. The number of unique towers visited and percent nocturnal calls were consistently strong predictors across all three countries. Other significant predictors varied across countries: outgoing call counts (Namibia), radius of gyration and entropy of places (Nepal), and various communication and consumption indicators (Bangladesh). The analysis of the spatial distribution of poverty revealed differences in the total number of people predicted as poor and their geographic distribution based on whether the full or generalized model was used. In Namibia, the difference was minor. In Bangladesh, the full model predicted significantly more people in poverty and a much more precise distinction between poverty and wealth. In Nepal, the generalized model predicted a greater number of people in poverty in line with population density, highlighting the influence of mobility data on model outputs in areas with high population density and potentially lower mobility.
Discussion
This study demonstrates the potential of using a small set of easily replicable CDR-derived features to estimate poverty across diverse geographical settings. The generalizability of the five key metrics highlights their potential for broader application in LMICs, offering a cost-effective and timely approach to poverty mapping. The results support the inclusion of aggregated, anonymized CDRs as a valuable component of poverty monitoring and evaluation. While the study does not aim to determine causality, the findings suggest relationships between socioeconomic status and mobility patterns (wealthier individuals exhibiting higher mobility) and call timing (poorer individuals making more calls during cheaper nighttime rates). The differences in model performance between the full and generalized models highlight the importance of incorporating country-specific features whenever possible to improve accuracy and capture the nuanced realities of poverty within each context. This emphasizes the need for understanding local contexts and incorporating ancillary data to refine poverty estimates.
Conclusion
This study provides strong evidence supporting the use of a limited set of generalizable CDR features for estimating poverty across diverse LMICs. The findings highlight the potential of mobile phone data as a cost-effective and timely supplement to traditional data sources. Future research should focus on exploring additional CDR features, incorporating other data sources (such as satellite imagery), and refining methods to address biases and limitations inherent in CDR data. The consistent predictive power of the identified metrics across different contexts demonstrates their potential for integration into national and international poverty monitoring frameworks.
Limitations
The study acknowledges several limitations. Data availability varied across the three countries, with Bangladesh offering a significantly larger number of potential CDR variables. The temporal mismatch between the CDR and survey data for Nepal and Bangladesh could have influenced the findings. The reliance on the DHS wealth index as a proxy for poverty means that the results reflect asset-based poverty rather than income or consumption-based poverty. Additionally, the study’s methodology relies on assumptions about user home location and spatial coverage of mobile towers which may introduce uncertainties. Finally, self-selection bias related to mobile phone ownership means that the poorest segments of the population may be under-represented in the CDR data. Despite these limitations, the study's findings contribute significantly to the understanding of how CDR data can be used for large-scale poverty mapping.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny