Introduction
The rapid growth of urban populations and built-up areas, particularly in low- and lower-middle-income countries, necessitates improved monitoring of urbanization for achieving the UN Sustainable Development Goals (SDGs). Building data is crucial for this, but comprehensive, spatially detailed global inventories are lacking. Traditional sources like national statistics offices often suffer from underfunding, limited accessibility, and outdated information. This data scarcity hinders accurate assessment of indicators related to SDG 11 (sustainable cities and communities), such as land consumption rates, slum proportions, and access to open spaces. While remote sensing offers some data, it lacks the detail needed for nuanced analyses of urban structure and change. OpenStreetMap (OSM), a collaborative, open-source map, presents a potential alternative, but its data quality is unevenly distributed geographically. This study addresses the need for a comprehensive understanding of the completeness and inequalities in global OSM building data to inform both data producers and users.
Literature Review
Existing research has explored OSM data completeness in specific cities, comparing OSM data to authoritative sources. However, these studies lack global scope and often rely on local reference data, limiting generalizability. To overcome this, researchers have proposed using globally available proxy data such as nighttime lights, built-up areas, and population data to estimate and predict OSM building completeness. This study builds upon previous work by providing a large-scale spatio-temporal analysis, incorporating a more extensive range of data sources to model and assess OSM completeness.
Methodology
This research employs a machine learning approach to assess OSM building data completeness across 13,189 urban centers globally, representing approximately 50% of the world's population. The methodology addresses two research questions: 1. What is the completeness of OpenStreetMap building data globally?; and 2. How unequally is urban OpenStreetMap building data distributed? A Random Forest regression model is trained using a comprehensive collection of open building data from commercial and authoritative sources, along with various predictor variables. Predictor variables included remote sensing data (land cover, population distribution, nighttime lights), the Subnational Human Development Index (SHDI), and urban road network density. The model assesses completeness at a 1 square kilometer grid cell resolution for each urban center. The analysis then investigates the spatial distribution of building completeness using various spatial statistics and techniques. Specifically, the study uses the Gini coefficient to measure evenness of completeness, Moran's I to assess spatial autocorrelation, and Local Moran statistics to examine local spatial clusters. Additionally, it examines the contributions of humanitarian and corporate mapping efforts to completeness using OSM data.
Key Findings
The analysis reveals a highly uneven distribution of OSM building data completeness. For 1,848 cities (14% of those analyzed), OSM building data exceeded 80% completeness, encompassing 16% of the global urban population. Conversely, for 9,163 cities (69% of those analyzed), completeness was less than 20%, representing 48% of the global urban population. The global average urban OSM building completeness is 24%. Europe and Central Asia, and North America showed relatively high completeness (71% and 64%, respectively), while Latin America & the Caribbean, East Asia & the Pacific, the Middle East & North Africa, and South Asia exhibited lower completeness (20%, 20%, 12%, and 9%, respectively). Sub-Saharan Africa's completeness was slightly above the global average at 30%. Organized humanitarian mapping activities contributed approximately 10% of building footprints globally, with a greater impact in Sub-Saharan Africa where over 50% of building edits were related to such activities. Corporate mapping contributions constituted less than 2% of global building edits. A strong positive correlation was observed between city size and completeness, although the temporal patterns were similar across different city size classes. The global Gini coefficient for building completeness was 0.8, indicating high inequality. This inequality, while slightly reduced over time, increased sharply between 2008 and 2014 before showing some leveling off. Spatial autocorrelation, measured by Moran's I, also declined since 2014, indicating a reduction in spatial clustering, though significant clustering persisted. An intra-urban analysis categorized urban centers into three types based on completeness, evenness (Gini coefficient), and spatial clustering (Moran's I): 1a (unmapped), 1b (sparsely mapped with clustered mapping), 2a (divided cities with clustered mapping), 2b (segregated cities with mapped and unmapped areas), and 3a (well-mapped).
Discussion
The findings highlight the complex spatial patterns of OSM building data completeness, influenced by socioeconomic factors, geographic location, and the contributions of various mapping initiatives. The significant inequalities observed underscore the need for careful consideration of potential biases when using OSM data for global urban analysis. The study's methodology provides a valuable framework for assessing completeness biases at multiple scales. The results confirm previously observed biases towards high-income countries, yet reveal a more nuanced picture than a simple dichotomy between high- and low-income regions. The temporal analysis shows a gradual decrease in inequality, particularly driven by humanitarian mapping efforts focusing on areas with lower socioeconomic development. However, recent trends suggest a potential stagnation or increase in inequality, possibly linked to the COVID-19 pandemic. The observed spatial clustering indicates that mapping activities tend to concentrate in specific areas, resulting in uneven coverage.
Conclusion
This research provides a comprehensive, global-scale assessment of OSM building data completeness and inequalities. The findings emphasize the importance of accounting for spatial biases in urban analyses using OSM. The study offers valuable recommendations for both data producers, who can use the completeness maps to guide future mapping efforts, and data users, who can employ the provided dataset to assess and mitigate bias. Future research should focus on improving the accuracy of building data, integrating OSM data with other sources, and extending this analysis to rural areas and other data quality dimensions.
Limitations
The study is limited to urban centers, potentially underestimating the overall global inequality as rural areas may have even lower completeness. The machine learning model's performance is influenced by biases in the training data and algorithms, impacting the accuracy of completeness estimations. The study also does not directly assess accuracy or attribute completeness of building data, focusing instead on the spatial extent of building footprints. Additionally, there is uncertainty associated with data from rapidly urbanizing areas where training data was limited.
Related Publications
Explore these studies to deepen your understanding of the subject.