Economics
Machine learning and phone data can improve targeting of humanitarian aid
E. Aiken, S. Bellue, et al.
COVID-19 caused sharp declines in living standards globally, with over 100 million people estimated to have entered extreme poverty. Many low- and middle-income countries lack recent income data, comprehensive social registries, or recent censuses to accurately target social assistance. Togo launched the Novissi emergency cash program in April 2020 to provide bi-weekly digital transfers to affected individuals but lacked a social registry and up-to-date poverty data. The research question is whether machine learning applied to non-traditional data—satellite imagery and mobile phone metadata—can improve targeting of poverty compared to feasible alternatives such as geographic or occupation-based targeting. The purpose is to evaluate accuracy, welfare, and fairness of phone-based targeting in Togo’s rural Novissi expansion and in a simulated nationwide program, addressing urgent needs in crisis settings where traditional data are unavailable or stale.
The study builds on literatures in: (1) poverty targeting and social protection design (e.g., proxy means tests, geographic and community-based targeting, self-targeting), (2) machine learning for poverty estimation using satellite imagery and mobile phone metadata, and (3) fairness and algorithmic decision-making. Prior work shows satellite and phone data can predict wealth at small-area and individual levels, and that PMTs can degrade over time and face implementation limits in low-data environments. The paper situates phone-based ML targeting among standard approaches like asset indices, Poverty Probability Index (PPI), and PMTs, and discusses the trade-offs between feasibility, accuracy, and equity.
Setting: Togo’s Novissi program provided digital cash transfers via mobile money. Initial eligibility relied on the national voter registry and informal occupation status; the rural expansion aimed to reach the poorest individuals in the 100 poorest cantons. Approach: Two-step targeting using non-traditional data. Step 1: Obtain micro-estimates of relative wealth at 2.4 km grid resolution from machine learning on high-resolution satellite imagery; aggregate to cantons (population-weighted) to identify poor areas. Step 2: Predict individual-level poverty among mobile subscribers using supervised machine learning on mobile phone metadata (call detail and usage records). Training data: Representative phone surveys (2020) measuring wealth via a proxy means test (PMT) and matched to detailed mobile metadata; and a national in-person household survey (2018–2019) with consumption data. Models: High-dimensional features from phone use; trained to maximize out-of-sample predictive performance of consumption/wealth (Pearson correlations 0.41–0.46). A simpler parsimonious model uses recent phone expenditures only as a proxy. Evaluation design: Compare the “phone-based” ML approach to counterfactual targeting methods: (a) geographic blanketing at prefecture (admin-2) or canton (admin-3) level, (b) occupation-based targeting (original informal worker rule and an ‘optimal’ strategy targeting the poorest occupation category), (c) phone-expenditure-only proxy, and (d) standard social registry-dependent methods (asset index, PPI, PMT; simulated using 2018–2019 survey). Two scenarios: (1) Actual rural Novissi expansion (target ≈29% poorest among mobile subscribers in 100 poorest cantons); ground truth is PMT from the 2020 phone survey. (2) Hypothetical nationwide program; ground truth is consumption from the 2018–2019 field survey. Metrics: Area under the ROC curve (AUC), accuracy, precision/recall at the 29% targeting threshold (precision equals recall by design), rank correlations; social welfare (constant relative risk aversion, fixed budget US$4M across 154,238 registrants) and fairness audits (demographic parity and error distributions by gender/ethnicity). Temporal robustness: Assess model performance with 18-month-old training/data. Exclusion analysis: Quantify structural exclusions due to prerequisites (voter ID, SIM/phone access, recent usage, program awareness, registration success) and algorithmic errors, using administrative and survey data.
Targeting accuracy and errors: • Rural Novissi scenario (2020 phone survey, PMT ground truth; poorest 29% target): Phone-based ML AUC=0.70; exclusion error (1−recall)=53%; accuracy 69%; precision/recall 47%; Spearman 0.38. Geographic targeting: AUC=0.59 (canton)–0.64 (prefecture), with higher exclusion errors (59%–78%). Phone-expenditure-only: AUC=0.57; poorer performance than ML. • Nationwide hypothetical (2018–2019 survey, consumption ground truth): Phone-based ML AUC=0.73; exclusion error ≈50%; better than geographic (AUC=0.66–0.68; exclusion errors 52%–76%). An ‘optimal’ occupation-based approach (targeting agricultural workers) slightly outperforms phone-based (exclusion ≈48%); the original informal-worker rule performs poorly (≈76% exclusion). • Compared to registry-dependent methods (simulated): Asset index AUC=0.55 (rural) and 0.75 (national); PPI AUC=0.81; PMT AUC=0.85. Thus, phone-based ML (AUC=0.70–0.73) is similar to asset-based indices but less accurate than PPI or a perfectly calibrated PMT. Relative performance: • Versus feasible geographic targeting in Togo, the phone-based approach reduces exclusion errors by 4–21%. • Versus comprehensive social registry methods (hypothetical), phone-based increases exclusion errors by 9–35%. Welfare: • With CRRA utility and fixed budget (US$4M; 154,238 registrants), phone-based targeting dominates geographic and other feasible methods in the rural Novissi context and achieves similar maximum utility to asset index and geographic in the national simulation; all targeted methods outperform universal basic income when optimally calibrated. Fairness: • Phone-based targeting shows no systematic bias against women or specific ethnic groups in ranking errors; demographic parity gaps exist across all methods, with the largest parity differences in geographic targeting. Sensitivity: • Benefits of phone-based targeting are larger in more homogeneous populations and when targeting the extreme poor; gains diminish in more heterogeneous national settings. Temporal stability: • Using training/data roughly 18 months old reduces predictive accuracy by 4–6% and precision by 10–14%. Exclusions: • Structural prerequisites contribute to exclusion: voter ID possession ≈83–98%; SIM/phone access 50–85% of individuals (lower in rural and among women); past phone use coverage for scoring 72–97%; program awareness 35–46%; registration success 72% after multiple attempts; algorithmic recall 47% (exclusion 53%).
The study demonstrates that ML on mobile phone data can materially improve targeting when traditional data are unavailable or outdated, particularly in crisis settings and in relatively homogeneous populations like rural Togo. By better identifying poor subscribers compared to feasible geographic approaches, phone-based targeting reduces exclusion and inclusion errors and increases social welfare. However, when comprehensive, current social registries exist, methods like PPI or PMT can outperform phone-based ML, indicating that phone-based approaches are complements rather than substitutes. Fairness audits suggest the phone-based method does not systematically disadvantage women or major ethnic groups, though no method achieves perfect demographic parity; geographic targeting exhibits the largest parity gaps. Performance is sensitive to data staleness and context: models require recent, representative training data, and benefits are larger when targeting the extreme poor. Moreover, real-world inclusion is limited not only by algorithmic errors but also by structural barriers (phone access, voter ID, awareness, digital literacy).
Machine-learning models trained on recent survey data and applied to mobile phone metadata can rapidly and cost-effectively improve the targeting of humanitarian cash transfers relative to feasible alternatives in data-scarce settings. In Togo’s Novissi program, this approach reduced exclusion errors versus geographic targeting and increased social welfare, with no evidence of systematic gender or ethnic bias. Relative to idealized registry-based methods (PPI/PMT), performance is lower, underscoring that phone-based targeting should complement—not replace—traditional approaches. Future work should integrate real-time phone data with field-based measurements, explore dynamic targeting after shocks (e.g., identifying those with largest consumption declines or those in affected locations), and address governance, privacy, and data-access frameworks to enable responsible, inclusive social protection systems.
Key limitations include: reliance on phone ownership and recent usage (lower among rural residents and women), voter ID possession, self-registration and digital literacy, all of which create structural exclusion. Algorithmic errors remain sizable (≈53% exclusion at the 29% threshold in the rural scenario). Model performance degrades with stale data (4–6% lower accuracy; 10–14% lower precision at ~18 months). Compared to current, comprehensive social registries, PPI/PMT can yield better accuracy. Parsimonious phone-expenditure proxies perform worse and may be vulnerable to gaming. Data access, privacy, and ethical concerns pose practical constraints and require robust safeguards and governance.
Related Publications
Explore these studies to deepen your understanding of the subject.

