Environmental Studies and Forestry

Soundscapes and deep learning enable tracking biodiversity recovery in tropical forests

J. Müller, O. Mitesser, et al.

Discover how Jörg Müller, Oliver Mitesser, and their team harness bioacoustics and metabarcoding to revolutionize monitoring tropical forest recovery. Their innovative research uncovers strong correlations between automated acoustic measurements and restoration progress, offering a cost-effective solution to combat climate change and protect biodiversity.... show more

Introduction

The study addresses how rapidly and reliably biodiversity recovers in tropical forests following abandonment of agricultural land and how to cost-effectively monitor that recovery. Tropical forests are central to climate mitigation and biodiversity conservation, yet biodiversity recovery is less predictable than carbon accumulation. Market-based conservation and certification efforts require scalable, transparent biodiversity monitoring to avoid carbon-only restoration (e.g., monocultures) and potential greenwashing. The research tests whether soundscapes and automated deep learning models derived from passive acoustic recordings can track faunal community recovery across a chronosequence of tropical forest regeneration, and whether acoustically derived metrics reflect broader biodiversity, including non-vocal taxa. The authors hypothesize that community configurations derived from CNN models and composite acoustic indices will correlate with recovery gradients and track community composition better than simple richness metrics.

Literature Review

Prior work shows that many taxa (amphibians, birds, mammals, insects) vocalize, enabling acoustic monitoring. Soundscape indices have tracked degradation, fragmentation, and post-disturbance recovery in various tropical systems, but results have been mixed and context-dependent. No single acoustic index consistently predicts biodiversity; indices capture different facets (signal-to-noise, frequency variation, entropy, complexity). Some studies found strong correlations (e.g., Acoustic Complexity Index) with abundance or richness, while others did not. Beyond indices, automated species identification methods are advancing: traditional supervised ML requires feature engineering and may struggle with noisy, diverse data; deep learning (e.g., CNNs) increasingly identifies birds, bats, and amphibians and can automate community assessment when models and labeled datasets are available. A key bottleneck is the need for large, labeled training datasets, particularly in hyper-diverse tropical communities. Community composition has predicted environmental gradients elsewhere, motivating the hypothesis that CNN-derived community axes will correlate with forest recovery. Additionally, insect diversity often correlates with vertebrate diversity, suggesting cross-taxon congruence that sound-based metrics might capture.

Methodology

Study design and sites: 43 plots along a forest recovery chronosequence in the El Lavadito Chocó protected area (Ecuador), including active pastures, cacao plantations, secondary forests recovering for 1–34 years, and old-growth forests (159–615 m a.s.l.). Recovery was natural post-abandonment (no active planting).\n\nAcoustic sampling: One Frontier Labs BAR-LT recorder per plot (microphone facing down, ~1.7 m height). Recording schedule: 2 min every 15 min, continuously for two weeks in October 2021 (Julian days 299–314) at 44.1 kHz.\n\nExpert identifications: Experts manually identified birds and mammals from 2-min files at key times (e.g., 06:00–07:15, midday, 16:00–18:15) across two days without heavy rain; amphibians were also identified. Presence frequency per plot was computed from files containing a species. Inter-observer bias was assessed using a second expert assessing offset files; results were similar.\n\nAcoustic indices: Using AnalysisProgram.exe (Towsey framework), per-file indices were computed and averaged over two weeks. Selected independent indices included Soundscape Saturation, Acoustic Diversity Index, Bioacoustic Index, Acoustic Evenness, Temporal Entropy, Acoustic Complexity, Entropy of Frequency, and Events per Second (subset used in models as per Table 1). Data preprocessing used R (stringr).\n\nCNN-based community composition: A multi-label MobileNetV2 CNN (transfer-learned from ImageNet) independently developed (Arbiom framework) was trained on 401,685 one-minute recordings from 55 sites in Ecuadorian Chocó (2019; AudioMoth and Guardian devices), targeting 115 song classes (112 species; 77 potentially present in study region). Training used template-matched positive/negative samples, mel-spectrograms (3 s windows; STFT 0.1 s window, 0.025 s hop; 50 Hz–8 kHz; 224 bins), data augmentation via frequency-banded mixup, class balancing (target 700 positives, 50 negatives per class), binary cross-entropy loss, Adam optimizer (lr=0.001), batch size 16, 100 epochs. Validation set: 200 expert-labeled 1-min files; additional independent 200 files for evaluation. Mean average precision 96.8%; at threshold 0.5, mean F1 77.2% (precision 82.9%, recall 80.7%); at 0.8, mean F1 78.9% (precision 94.8%, recall 77.5%). For analysis, 3 s windows with 1 s overlap were scored; species presence per file used a 0.8 threshold. The CNN outputs were used to derive a bird community composition axis.\n\nInsect sampling and metabarcoding: Autonomous LED light traps (UV off) operated one night per plot during the acoustic sampling period, primarily attracting Lepidoptera and Diptera. Insects were collected post-dusk and euthanized with chloroform. Large-bodied moths (Satyrinae, Sphingidae) and Coleoptera were targeted for morphology; remaining bulk samples were size-fractionated (8 mm sieve) to mitigate biomass bias. DNA extraction followed homogenization and ATL/Proteinase K digestion; DNeasy Blood & Tissue kit used. COI minibarcode primers (Leray set) amplified; library prep used Illumina Nextera XT; sequencing on MiSeq 2×300 bp.\n\nBioinformatics: Paired-end merging (USEARCH), adapter trimming (CUTADAPT), quality filtering/dereplication/chimera removal/pre-clustering (VSEARCH), SWARM v3.1.0 clustering to OTUs; cleaning thresholds (<0.01% per-sample reads, negative-control filtering). OTUs were BLASTed against NCBI nt and custom BOLD/GenBank databases (Animalia data from BOLD; confidence filtering). OTUs were assigned to B1N clusters (barcode index numbers) as proxies for species-level units, enabling ecological analyses even with incomplete reference libraries.\n\nStatistical analyses: For vocalizing vertebrates, species richness per plot and richness of species observed in old-growth plots were computed. Community composition was ordinated via NMDS (vegan::metaMDS, Bray-Curtis), axes rotated by PCA to maximize variance on axis 1. The first axis score represented a linear recovery gradient and was used for modeling. The same approach was applied to CNN-derived bird communities and metabarcoded nocturnal insect communities. Linear Gaussian models related (1) vertebrate NMDS axis 1, (2) vertebrate richness (log-transformed), (3) old-growth vertebrate richness (log-transformed), and (4) nocturnal insect NMDS axis 1 to (i) composite acoustic indices and (ii) CNN-derived bird community axis. Residuals were checked for spatial independence (mgcv cross-correlation); all models showed spatially independent residuals.

Key Findings

Expert-identified vocalizing vertebrate communities showed a clear gradient in NMDS space along the recovery chronosequence. Old-growth forests were generally distinct from regenerating sites, with some convergence at later succession stages and high variation in early recovery.\n- Species richness of vocalizing vertebrates decreased along the recovery gradient, potentially due to transient species use of agricultural plots and spill-over from surrounding forests.\n- Composite acoustic indices had high explanatory power for vertebrate community composition (adjusted R² = 0.62) and moderate power for old-growth species richness and nocturnal insect composition, but low for total vertebrate richness (adj. R² = 0.20). For nocturnal insects, Entropy of Frequency, Soundscape Saturation, and Events per Second increased toward old-growth-like communities.\n- A CNN-derived bird community composition metric correlated well with the restoration gradient (adj. R² ≈ 0.69) and was highly correlated with both expert-derived vertebrate community axis and nocturnal insect community axis, indicating that automated acoustic community metrics capture broader biodiversity recovery beyond vocal taxa.\n- Excluding 13 vocalizing insect species from light-trap data did not change insect community axis (Pearson r = 0.986), confirming light-trap assemblages represent predominantly non-vocalizing insects.\n- Species composition, rather than richness, served as a more reliable indicator of forest recovery.\n- Automated sound-based measures (acoustic indices and CNN-based community composition) effectively tracked recovery across a gradient from active agriculture through regenerating forest to old-growth, using robust, reproducible, and cost-effective data.

Discussion

The findings support the hypothesis that soundscape-derived metrics and deep-learning-based community composition track biodiversity recovery in tropical forests. Community composition of vocalizing vertebrates aligns with recovery time and distinguishes old-growth from regenerating sites, whereas richness alone may be misleading. Composite acoustic indices and CNN-derived bird community axes not only reflect vertebrate community reassembly but also predict patterns in non-vocal nocturnal insects, demonstrating cross-taxon congruence. This generality suggests that passive acoustic monitoring can serve as a scalable, transparent tool for evaluating restoration outcomes, aiding adaptive management and reducing reliance on proxies like stand age or carbon stocks that overlook anthropogenic pressures (e.g., logging, hunting). The approach addresses the pressing need for standardized, reproducible biodiversity monitoring in market-based conservation and certification contexts, potentially mitigating greenwashing by directly quantifying biodiversity gains. By leveraging long-term, re-analyzable audio datasets, conservation practitioners can enhance evidence-based decision-making and assess the biodiversity value of restoration projects.

Conclusion

Soundscape analysis and CNN-based community metrics provide robust, scalable indicators of tropical forest faunal recovery, capturing community composition across a full gradient from agriculture to old-growth. The study demonstrates that automated measures track not only vocalizing vertebrates but also non-vocal insect community dynamics, highlighting species composition as a superior indicator to richness for restoration monitoring. Future work should (i) expand and improve AI models across taxa and regions, (ii) build global, open sound repositories for multiple vocalizing groups beyond birds, (iii) validate model performance along diverse environmental gradients regionally, and (iv) standardize passive acoustic monitoring protocols. Publicly accessible soundscape datasets will enable retrospective analyses with advancing methods, improve transparency, and support the development and monetization of biodiversity credits linked to carbon removal while reducing greenwashing.

Limitations

Acoustic indices individually showed variable and often low predictive power; performance improved when combined, suggesting sensitivity to context and index selection.\n- The CNN covered a subset of regional bird species; model performance and generality are limited by the availability and representativeness of labeled training data and may be region-specific.\n- Early-stage recovery plots exhibited high variability influenced by surrounding landscape context; recovery age alone is an imperfect proxy and does not capture additional anthropogenic impacts (e.g., logging, hunting).\n- Light traps preferentially sample nocturnal, flying insects (e.g., Lepidoptera, Diptera), potentially underrepresenting other insect groups; only one night of trapping per site.\n- Metabarcoding introduces biases (e.g., primer specificity, biomass-driven read count disparities); incomplete barcode reference libraries (especially for ‘dark taxa’) can affect taxonomic resolution and assignments.\n- Recording conditions (weather, noise) and device placement may influence acoustic data quality; expert identifications, while cross-checked, are still subject to detection and observer biases.\n- The CNN was trained and validated on datasets from similar regions; broader generalization to other tropical systems requires further validation.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Recent Advancements and Perspectives in the Diagnosis of Skin Diseases Using Machine Learning and Deep Learning: A Review

J. Zhang, F. Zhong, et al.

Economics

Utilizing mutual learning in literature and cultural industry finance in order to realise green economic recovery and sustainability

C. Wang, J. Zhang, et al.

Environmental Studies and Forestry

Deep learning for detecting and characterizing oil and gas well pads in satellite imagery

N. Ramachandran, J. Irvin, et al.

Medicine and Health

Deep learning in image-based breast and cervical cancer detection: a systematic review and meta-analysis

P. Xue, J. Wang, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny