logo
ResearchBunny Logo
Semi-automatic mapping of pre-census enumeration areas and population sampling frames

Social Work

Semi-automatic mapping of pre-census enumeration areas and population sampling frames

S. Qader, V. Lefebvre, et al.

Discover how a semi-automatic method for mapping Enumeration Areas significantly improves data collection for national censuses. This innovative approach, led by Sarchil Qader and colleagues, utilizes high-resolution population data and publicly available boundaries to create efficient mapping solutions, proving vital for areas like Somalia.

00:00
00:00
Playback language: English
Introduction
Census enumeration areas (EAs) are crucial for data collection and serve as national sampling frames for surveys. In many countries, particularly those affected by poverty or conflict, EA demarcations are incomplete, outdated, or absent. Traditional methods of EA creation—manual digitization of high-resolution satellite imagery or ground-based boundary mapping—are time-consuming, expensive, and labor-intensive. Furthermore, creating EAs involves an optimization problem that considers population and area size within each unit, requiring careful balancing of several criteria: mutual exclusivity and exhaustiveness (covering the entire country without overlaps); easily identifiable boundaries for field teams; alignment with administrative boundaries; compactness of shape; approximately equal population sizes; appropriate size for enumerator coverage; suitability for varied tabulations; sufficient size to guarantee data privacy; and usefulness for other data collection activities. The lack of well-defined EAs in many low- and middle-income countries (LMICs), especially in conflict zones, results in the absence of a nationally representative sampling frame. This limits the ability to conduct accurate, representative population surveys, potentially undersampling vulnerable populations. Historically, EA creation has been costly. While Geographic Information Systems (GIS) and high-resolution satellite imagery have facilitated manual digitization, this remains labor-intensive, prone to human error, and challenging in incorporating population and area constraints. Existing tools for creating sampling frames from gridded population estimates, like GridSample and Geo-sampling, or employing statistical region merging techniques, may not align with necessary ground features for enumerators. This study introduces a novel semi-automatic approach leveraging freely available data on population density (e.g., from WorldPop), and georeferenced features (e.g., from OpenStreetMap) to create EAs. The approach is demonstrated in the context of Somalia, where such detailed mapping is currently non-existent.
Literature Review
Several methods exist for creating population sampling frames. GridSample and Geo-sampling tools utilize gridded population estimates, while other approaches incorporate remote sensing data and GIS techniques to generate homogeneous regions. Statistical region merging algorithms, such as the max-p algorithm and ArcGIS/AZTool, group areas based on similar characteristics. However, these methods may not always align with ground features crucial for field teams. The ArcGIS/AZTool toolkit, while using existing EAs as a base unit, requires suitable EA-level data, which are often unavailable in LMICs. This research addresses the gap by combining freely available high-resolution population models and georeferenced features to create a new set of EAs.
Methodology
This study uses a semi-automatic EA delineation process based on a 'split and merge' methodology, inspired by image segmentation techniques. The process is applied to create a national sampling frame for Somalia. The general approach involves three steps: First, the country is split into small sub-areas using vector data (roads, waterways, administrative boundaries) from OpenStreetMap (OSM). This ensures that boundaries align with visible features. Second, the population size for each sub-area is estimated using a 100m x 100m high-resolution gridded population density map. This map is generated by combining multiple data sources such as building density, household density, and population density from the World Bank survey, complemented by DigitalGlobe population estimates and the 2014 Population Estimation Survey (PESS). The rural and urban strata are defined using PESS 2014 urban EAs for the urban strata and the remaining area is considered as rural. This map accounts for recent datasets, PESS 2014 data, and regional totals. Areas identified as settled, but lacking data, use modelled population density based on similar settlements. The population density is set to zero for known unsettled locations and low values for potentially settleable areas with missing data. Finally, the map is rescaled using the PESS 2014 regional totals. Third, any areas exceeding a specified population or area threshold (e.g., population > 750 and area > 9 km²) are further split using a quadtree algorithm, ensuring all regions meet predefined constraints. After splitting, a merging process uses the Automated Zone-design Tool (AZTool) to combine regions, creating EAs within user-defined ranges of population size and area. Hard boundaries (administrative units, large rivers) prevent merging across certain lines. The algorithm aims to produce EAs as close as possible to target population and area sizes while maintaining compactness. In Somalia, this is applied separately to the 18 pre-war regions. The probability of selection for each EA is calculated proportional to its population within its pre-war region and stratum (urban/rural). The study compares the resulting semi-automatically generated urban EAs with manually digitised EAs from the 2014 PESS in Mogadishu and Hargeysa. For the rural EAs, the population size ranges from 0 to 750, with a maximum area of 9 km². The probability of selection is calculated for each EA. The urban EAs are compared with those manually created, with the focus on population distribution.
Key Findings
The semi-automatic method generated 113,367 rural EAs in Somalia, with population sizes ranging from 0 to 750 and a maximum area of 9 km². The probability of selection for each EA was calculated, with probabilities summing to 1 within each regional stratum. In urban areas (Mogadishu and Hargeysa), the semi-automatic approach created EAs with population sizes ranging from 150 to 2000 and areas ranging from 2000 m² to 4 km². Comparison with manually digitised urban EAs from the 2014 PESS revealed that the manual EAs showed some inconsistencies, with gaps (zero population) or overly large areas. The semi-automatically generated EAs exhibited better adherence to ground features like roads. The process of creating the EAs in Somalia took only a few weeks with a team of two people, showcasing significant time savings over traditional manual approaches. The analysis also shows that the EAs generated using the semi-automatic approach satisfy the defined criteria of EAs including: mutual exclusivity, exhaustiveness, alignment with ground boundaries, compactness, and suitable size for enumerator coverage. In addition, the method allows for easy creation of a nationally representative sampling frame with probabilities of selection proportional to the population size.
Discussion
This study demonstrates a novel, semi-automatic approach to EA delineation that offers substantial advantages over traditional methods. The results highlight the time and cost efficiency of the method, which is particularly relevant for resource-constrained settings. The alignment of EA boundaries with ground features enhances the usability of the maps for field teams. Furthermore, the creation of a probability-based sampling frame from the EAs improves the accuracy and representativeness of population surveys. The key advantage lies in its ability to utilize readily available and up-to-date geospatial data to generate comprehensive and accurate EAs, addressing challenges faced by many LMICs in creating suitable sampling frames. While the generated EAs in Somalia fulfill most EA criteria, the limitations concerning data quality should be acknowledged and future improvements considered.
Conclusion
This paper presents a novel semi-automatic approach to mapping pre-census enumeration areas, successfully demonstrated in Somalia. The method leverages freely available, high-resolution data and significantly reduces the time and cost compared to manual methods. The resulting EAs are consistent with standard criteria, offering a valuable tool for low-income and data-poor settings. Future research should focus on improving the algorithm's robustness, incorporating additional data sources, and developing a user-friendly open-source software tool for broader accessibility.
Limitations
The accuracy of the EA delineation depends on the quality of the input data—high-resolution population estimates and georeferenced features. In Somalia, some data manipulations were necessary to create a reliable population density map, which affects the reproducibility. The quality of OSM data also influences the accuracy of boundary delineation, with discrepancies noted in sparsely populated areas. The reliance on AZTool within ArcGIS limits accessibility; a standalone, user-friendly tool is needed. Spot-checking is necessary to address discrepancies between modeled populations and ground-truth census data. This should potentially be incorporated into the iterative process of EA creation and refinement.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny