Introduction
Maintaining and restoring functional habitat networks is crucial for biodiversity conservation. Remnants of naturally dynamic forest landscapes are key biodiversity hotspots, but these areas are scarce and declining globally due to intensive forestry and other human impacts. The concept of High Conservation Value Forests (HCVF) aims to identify and protect forests with high levels of naturalness, characterized by native species, long forest continuity, structural complexity, and low anthropogenic influence. Mapping HCVF is essential for evidence-based conservation targets, spatial planning, and forest landscape restoration. Sweden, a major wood producer with a history of intensive forestry, serves as a case study due to its large forest area, diverse ecoregions, and high degree of forest transformation. The existing national-scale HCVF dataset is incomplete and not up-to-date, highlighting the need for a more comprehensive and spatially explicit approach.
Literature Review
Previous efforts to map forests with high conservation value have employed various methods, including systematic field inventories, analyses of historical databases, and remote sensing-based approaches. However, most studies using high-resolution, wall-to-wall datasets either adopt a global or continental perspective, resulting in coarse resolution unsuitable for landscape-scale planning, or focus only on local areas of conservation interest. Existing large-scale models often suffer from weak predictive performance due to limited and spatially clustered training data, extrapolating beyond the feature space covered by the reference data. Studies in Romania and Norway offer similar frameworks but lack the spatial scale and comprehensive data integration of the current study.
Methodology
This study uses a data mining and predictive modeling approach to map HCVF across Sweden's forestlands. The Random Forest (RF) machine learning algorithm was employed, trained and tested using the existing national-scale HCVF database. The model incorporates publicly available high-resolution spatial datasets describing landscape configuration, topography, forest structure, and socio-economic factors at multiple scales. Sweden was divided into four biogeographic regions (North boreal, South boreal, Hemiboreal, and Nemoral), and an independent RF model was trained for each region. The study focuses on predicting continuous values of the relative likelihood of HCVF occurrence rather than binary classification, providing more nuanced information about the gradient of forest naturalness. A 10-fold spatial cross-validation (SCV) was used for model evaluation, employing various performance metrics such as ROC AUC, PR AUC, Pearson's R, Brier's score, and MCC. Variable selection was performed to reduce collinearity. Model validation involved two independent datasets: stand-level data from Sveaskog (n=57,548) and plot-level data from the Swedish National Forest Inventory (NFI; n=13,775). Partial dependence plots were used to visualize the relationships between key predictor variables and the relative likelihood of HCVF occurrence. The final predictions were generated using all available training data after 10-fold SCV validation. The models' robustness was assessed by comparing their performance against alternative models using different variable combinations, scales, and modeling techniques (Logistic Regression).
Key Findings
The RF models demonstrated high predictive accuracy (ROC AUC: 0.89-0.90; PR AUC: 0.84-0.89), successfully capturing the gradient of forest naturalness. Key influential variables across all regions included forest structural properties (height, variation in height), multi-scale landscape patterns of forest management intensity, and topographic complexity. HCVF were more likely in areas with complex topography, higher elevation, taller trees, and structurally diverse stands. Variables representing negative human impact showed decreasing relationships with HCVF likelihood. Both stand-level (Sveaskog) and plot-level (NFI) validations confirmed that predicted relative likelihoods of HCVF occurrence align with varying levels of forest naturalness and conservation values. Comparison with alternative models, such as using all variables without addressing collinearity or a 'global' model without regional stratification, highlighted the superior performance of the selected regionalized model. Logistic Regression showed similar overall predictive power but lower precision compared to Random Forest.
Discussion
This study demonstrates the potential of integrating readily available spatial data and machine learning to create detailed, accurate maps of forest conservation value at a fine spatial resolution. The wall-to-wall mapping of HCVF relative likelihood provides crucial information for strategic conservation planning and forest management. The continuous gradient of predicted values allows for nuanced decision-making, identifying areas for protection, restoration, or continued intensive forestry, depending on local priorities and management goals. The approach helps bridge the gap between ambitious conservation targets and practical implementation, offering actionable insights for effective spatial planning and habitat network design. The findings are relevant for countries and regions facing similar challenges in balancing biodiversity conservation and intensive forestry.
Conclusion
This study presents a robust framework for predicting HCVF at a 1-hectare resolution, using readily accessible data and machine learning. The resulting maps provide valuable information for evidence-based conservation planning and forest management. Future research could explore the integration of additional data sources (e.g., species occurrence data) to further enhance predictive accuracy and refine the assessment of conservation values. Applying this methodology to other regions and integrating it with socio-economic considerations will aid in the development of more sustainable forest management strategies.
Limitations
The HCVF training data, while comprehensive, does not represent a perfectly random probabilistic sample, potentially influencing the model's predictions. The Global Forest Change (GFC) data used as a proxy for human-related pressure does not distinguish between different causes of forest change. The applicability of this approach in other regions may be limited by the availability of comparable high-resolution spatial datasets, particularly LiDAR-derived forest structure data. Field validation should always precede any final conservation decisions based on the model's predictions.
Related Publications
Explore these studies to deepen your understanding of the subject.