logo
ResearchBunny Logo
Assessing urban livability in Shanghai through an open source data-driven approach

Urban Studies

Assessing urban livability in Shanghai through an open source data-driven approach

Y. Long, Y. Wu, et al.

Explore the innovative framework developed by authors Yin Long, Yi Wu, Liqiao Huang, Jelena Aleksejeva, Deljana Lossifova, Nannan Dong, and Alexandros Gasparatos to analyze urban livability in Shanghai. This research utilizes open-source data to reveal insights into housing, transportation, and living conditions that can guide future urban planning efforts.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses how to assess and map urban livability at sub-city scale in rapidly urbanizing contexts where high-quality, disaggregated socioeconomic data are limited. It situates livability as a multi-dimensional concept encompassing access to services, demographics, socioeconomic change, culture, and environment. Conventional city-level indices and rankings, often reliant on aggregated statistics, miss spatial disparities and underutilize increasingly available spatial data such as Points of Interest (POIs). The study highlights challenges in reconciling disparate open datasets (resolution, coverage, quality) and the risk of data overload. China—and Shanghai in particular—provides a pertinent context due to rapid urbanization, uneven access to amenities, political constraints on detailed official data, and the growing availability of open-source spatial data. The research goal is to develop and apply an open-source, data-driven framework that integrates residential building clusters (RBCs), population distribution, transport networks, and POIs to produce spatially explicit livability scores for Shanghai, identify low-livability areas, and inform planning priorities within 1–2 km of residential areas.
Literature Review
The study reviews diverse approaches to livability assessment, including global rankings and composite indices applied to Europe, Australia, and Singapore, and studies linking livability to transport choice and urban form. It notes that conventional methods often rely on limited or highly aggregated indicators. Recent work increasingly leverages spatially explicit demographic, socioeconomic, and POI data, with applications using machine learning (e.g., LMBP), AHP-based convenience indices, and analyses of relationships between POIs, transport networks, urban form, and social activities. Despite advances, integrating heterogeneous open datasets remains challenging due to differences in spatial resolution, coverage, and quality, leading to underuse of POIs and omission of critical livability factors such as housing quality. The paper positions its approach within this gap by fully mobilizing POIs together with open data on housing, population, and transport to derive sub-city livability patterns.
Methodology
Conceptual framework and scope: Livability is treated as multi-dimensional, focusing on: (a) housing characteristics, (b) accessibility to transport, and (c) availability and accessibility of amenities/services (education, medical/health, recreation, transport service, and living services). Analyses are conducted around Residential Building Clusters (RBCs) to leverage available open data and to reflect conditions within walking/transit access ranges (1 km and 2 km radii). Study site: Central Shanghai districts (Xuhui, Putuo, Yangpu, Pudong, Hongkou, Changning, Jingan, Huangpu) are analyzed due to high population density, economic activity, diversity of amenities, and challenges like congestion and inequality. Data collection and preprocessing: - RBCs (housing characteristics): Scraped from Lianjia (July 2020) via Python. Metadata include ID, community/district names, address, housing price (RMB/m²), construction year (building age), structure, fees, companies, area, number of households, and coordinates. 15,994 valid RBCs were extracted. Prices were binned into nine classes for analysis. - Population: WorldPop 100 m × 100 m grid (2020) for population distribution, calibrated to RBC household counts when needed. - Transport network/access: Open-source transport-related POIs; bus stops (13,334) and metro entrances (1,506) identified via API IDs/types. - POIs: From Baidu Map Place API up to end of 2020; approx. 15 million POI records before filtering. POIs classified in 23 first-level categories (235 second-level; 2,008 third-level). For computational tractability and relevance to livability, 23 categories were consolidated into five domains: education; medical service; recreation (landscape, sports, indoor, event); transportation service; living service (restaurant, business, shopping, living, enterprise, car service, government, hotel, passages, finance, car maintenance, public service, car sales, motor service). POIs beyond 2 km from downtown area/RBCs were excluded; distances from each RBC to nearby POIs computed. Normalized factor scores (RBC-level): Four factors are min–max normalized to [0,1]: housing price, building age, population density (defined as average population per building in the RBC; number of buildings used as a proxy for building area where needed), and transport accessibility. POI diversity is computed via an entropy-based index over POI types within 1 km and 2 km of each RBC, reflecting variety/evenness of surrounding services. - Transport accessibility: Location-based cumulative opportunity index counts bus/metro opportunities within 1 km and 2 km, divided by total opportunities citywide. - POI diversity: Entropy over the distribution of POI categories around each RBC; higher indicates greater diversity. Livability scoring: Livability is modeled as a tradeoff between (i) the average of the five RBC factors (price, age, density, transport access, POI diversity) and (ii) the balance of POIs around each RBC using a Gini–Simpson-type balance index that captures concentration vs. diversity across the five POI domains. Preferences for POI domains are represented by weights proportional to the abundance of POIs in each domain around an RBC. Five livability scores are computed per RBC, each emphasizing one POI domain (education, medical, recreation, transportation service, living service) while keeping RBC intrinsic factors constant. Scores are mapped for 1 km and 2 km radii. Spatial analysis and mapping: Inverse Distance Weighting (IDW) and Standard Deviational Ellipse (SDE) analyses are used to reveal spatial patterns and trends of livability scores across the city, identify hotspots/overlaps among dimensions, and depict directional trends (east–west, etc.). Analyses were conducted in ESRI ArcMap 10.4. Distance bands: 1 km and 2 km radii were chosen to approximate comfortable walking distances and short trips via transit, aligning with thresholds used in prior literature on accessibility and the "15-minute city" concept.
Key Findings
- Dataset scale and composition: - RBCs: 15,994 (2020). - Transport POIs: 13,334 bus stops; 1,506 metro entrances (2020). - Total POIs used (Table 1 categories across eight districts): 828,845. Top categories by share: Named entities 14.88% (123,353); Shops 13.42% (111,256); Daily life services 10.57% (87,610); Indoor services 10.26% (85,038); Real estate 9.68% (80,231). Restaurants 8.26% (68,445); Enterprise 7.48% (62,035); others follow. - RBC and urban characteristics spatial patterns (Fig. 1): - Population density is highest in the northwest; overall distribution is uneven. - Housing prices peak in a circular pattern centered on the Bund; generally higher in western Shanghai than surrounding areas. - Building ages: Most RBCs constructed post-1980; newer clusters in southeastern areas (e.g., Pudong), but older clusters remain in districts such as Jingan and Hongkou. - RBCs are strongly integrated with bus/road networks, indicating broad mobility options. - Livability distributions within 1 km (Fig. 2): - Education and medical services show similar spatial patterns, with resources bundled around RBCs; similar scores suggest concentration without wide diversity across locations. - Recreation-related livability is relatively lower citywide and clusters mainly in the western part of downtown, aligning with population and higher housing prices. - Living service and transportation service scores tend to run opposite to RBC density clusters because excessive POI clustering near RBCs can imply congestion within short distances, thus lowering scores in the model. Living service scores are generally even across the city, with highest averages in downtown areas. - Livability within 2 km (Fig. 3): - Education highest in the middle of downtown; medical highest in the north. More RBCs achieve high medical livability than educational at the 2 km scale. - Trend analysis (SDE; Fig. 4): - Livability exhibits a pronounced east–west distribution across the downtown area for most dimensions, consistent between 1 km and 2 km analyses. - High-livability hotspots (except recreation) cluster in central–northwest districts (e.g., Hongkou, Yangpu). - Recreation hotspots are more sparse and localized, often in older districts with historical/green amenities (e.g., Huangpu), and more distant from RBCs. - Transport service livability displays the largest spatial dispersion and the broadest coverage, consistent with a metropolis with intensive commuting/access needs. - Aggregate dimensional insights: - On average, living services attain the highest livability scores, followed by transportation service—both essential to daily needs. - Educational and medical facilities are relatively dispersed yet mainly concentrated in the west, influencing housing prices and RBC density patterns.
Discussion
The study demonstrates that open-source data can uncover fine-grained, sub-city patterns of livability in a rapidly urbanizing megacity. By centering analysis on RBCs and their 1–2 km surroundings, the approach reveals uneven spatial distributions across livability dimensions. The results indicate that while living services and transport accessibility are broadly strong—especially in downtown—education and medical services show spatial disparities, and recreation opportunities cluster in fewer, often western or historically endowed districts. These findings directly address the research question by identifying priority geographies for intervention: areas north, east, and south with weaker education/medical access, and regions lacking recreation/green amenities. Policy relevance includes prioritizing expansion of medical and educational facilities to underserved areas beyond 2 km of many RBCs, and improving the spatial equity of green/recreational spaces. The paper aligns with related spatially explicit livability work in cities such as Singapore, Melbourne, Vancouver, and Wuhan, noting similar identification of multi-dimensional hotspots and potential trade-offs. While methods differ across studies, the convergence on intra-urban disparities underlines the importance of spatially resolved, multi-domain livability diagnostics for planning decisions.
Conclusion
This work contributes a scalable, open-source, data-driven framework to evaluate urban livability at the sub-city level using RBC-centered analysis of housing characteristics, population density, transport accessibility, and POI availability/diversity. Applied to Shanghai, it reveals systematic spatial disparities across livability dimensions and identifies priority areas for targeted investments—particularly in education, healthcare, and recreation/green spaces. The approach reduces dependence on restricted official data, leverages frequently updated POIs, and offers actionable insights for planners and decision-makers. Future directions include: (a) integrating additional environmental quality indicators (e.g., air pollution, temperature, hydrology) as more granular datasets become available; (b) expanding recreation metrics beyond green spaces to include squares, sports, cultural, and waterfront venues; (c) improving RBC coverage/accuracy via multiple data sources; and (d) exploring avenues for composite livability measures or multi-criteria aggregation while preserving interpretability and addressing denominator comparability across dimensions. The methodology is adaptable to other cities, but results should be interpreted within each city’s socio-cultural and infrastructural context.
Limitations
- Data integration and resolution mismatch: Housing, population, and RBC datasets differ in spatial resolution and coverage. Population was superimposed on RBCs and buildings were used as a proxy for area, introducing uncertainty. - Omitted environmental dimensions: Could not include eco-environmental indicators (e.g., temperature, hydrology, air quality such as NOx or PM2.5, waste management) due to data availability and format constraints. - Partial recreation coverage: Recreation was primarily proxied via green spaces; other recreational spaces (public squares, sports, cultural, waterfront) were not fully integrated given data limitations. - Incomplete RBC coverage: Despite using a large real estate platform, some residential information may be missing; future work should triangulate multiple sources. - No single composite livability score: Dimensional scores employ a Gini–Simpson-based balance index with different denominators across dimensions, precluding straightforward aggregation into a single composite score. - Context specificity and generalization: Findings are tailored to Shanghai’s geography and datasets; caution is required when generalizing to other cities with different socio-economic and infrastructural profiles.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny