Earth Sciences
Meta-learning to address diverse Earth observation problems across resolutions
M. Rußwurm, S. Wang, et al.
Discover METEOR, an innovative deep meta-learning model that revolutionizes Earth observation by tackling remote sensing challenges across varied resolutions. Developed by a team of experts including Marc Rußwurm, Sherrie Wang, Benjamin Kellenberger, Ribana Roscher, and Devis Tuia, this model adapts to new geospatial problems, even with limited data, outpacing traditional methods in effectiveness.
~3 min • Beginner • English
Introduction
The study addresses the challenge that Earth observation (EO) problems are typically tackled in isolation, despite shared structure across tasks, sensors, geographies, and label schemes. Deep models require large annotated datasets and suffer from covariate and concept shifts across regions and tasks. The research question is whether a single meta-learned model can capture transferable knowledge from global land cover tasks and rapidly adapt to diverse, heterogeneous EO problems using few labeled examples. The purpose is to enable efficient adaptation across varying spatial/spectral resolutions and class spaces with minimal labels. The work positions METEOR within transfer learning and meta-learning, aiming to overcome limitations of current approaches that operate within homogeneous problem families or single sensors.
Literature Review
The paper situates METEOR among transfer learning and meta-learning approaches. It reviews: (i) model-based transfer learning and emerging remote sensing foundation models (e.g., RingMo, SSLTransformerRS) pre-trained on heterogeneous datasets and fine-tuned on downstream tasks; (ii) meta-learning taxonomies—metric-based (e.g., Prototypical Networks) and optimization-based (e.g., MAML and variants)—with prior remote sensing applications mainly within homogeneous settings: high-resolution aerial RGB scene classification and medium-resolution multispectral land cover/cropland mapping; (iii) limitations of prior work focusing on single data sources or problem families and not addressing heterogeneous transfer across resolutions/sensors; (iv) recent general-purpose featurization approaches like MOSAIKS showing cross-domain utility. The paper also discusses normalization issues in meta-learning (transductive batch norm’s pitfalls under class imbalance) and proposed alternatives (TaskNorm), motivating the use of instance normalization for realistic, imbalanced EO tasks.
Methodology
METEOR is a heterogeneous transfer learning framework using optimization-based meta-learning. Core components: (1) Meta-model: a ResNet-12 with instance normalization in place of batch normalization; 15 input channels to support Sentinel-1 (2 radar bands) and Sentinel-2 (13 optical bands); single-output head for binary one-vs-all classification. Pre-training uses second-order MAML on Sen12MS tasks. Tasks: for each geographic area, 16 images with four randomly selected land cover classes; split 8 train/8 test; reformulated to binary one-vs-all by selecting one target class and treating others as negatives (2-shot, 4-way setting). Objective: binary cross-entropy; inner loop stochastic gradient descent with step size α=0.32, one gradient step; outer loop Adam (lr=0.001 decayed 0.1 on plateau), batches of 16 tasks; early stopping based on validation loss (up to 40,000 iterations). (2) Task-model: ensemble of one-vs-all classifiers. For a downstream task with n classes, instantiate n binary classifiers sharing architecture and initialized from the meta-model weights. Each is fine-tuned with SGD (step size 0.32–0.4, 20–60 steps) using binary cross-entropy. At inference, scores are combined via softmax to obtain class probabilities (sigmoid per-class used for qualitative occlusion analysis). (3) Spectral adaptation: dynamically select subsets of the first-layer convolutional filter banks to match available spectral bands in the downstream sensor; no changes needed for varying spatial resolutions due to ResNet’s flexibility. (4) Segmentation variant: for pixel-wise outputs (e.g., marine debris), remove global average pooling, replace final linear layer with 1×1 convolutions to produce low-resolution score maps (e.g., 9×9 for 64×64 inputs) and upsample via bicubic interpolation; fine-tune with pixel-wise cross-entropy. Training resources: pre-training on 2×NVIDIA V100 (~48 h; ~5 kg CO2e). Inference/fine-tuning timing examples provided; code and weights released.
Key Findings
- Normalization is critical under class imbalance: On idealized, balanced Sen12MS test tasks, MAML with transductive batch norm achieved the highest accuracy (~0.85), but it performed worst on realistic, imbalanced DFC2020 tasks (~0.26 ± 0.05). Replacing batch norm with instance normalization yielded markedly better DFC2020 performance (~0.82 ± 0.08), outperforming TaskNorm-1 (~0.59 ± 0.24), conventional BN (~0.60 ± 0.18), and GroupNorm (~0.54 ± 0.20) while remaining competitive on Sen12MS (e.g., ~0.78 vs 0.85 for transductive BN). The MAML+IN configuration was used for all subsequent results and outperformed SparseMAML variants on realistic tasks.
- Within land cover (DFC2020, few-shot across 7 regions): METEOR achieved strong performance across shots with average rank 2.84, e.g., accuracies 1-shot 61.5 ± 10.7; 2-shot 69.2 ± 11.9; 5-shot 78.6 ± 11.2; 10-shot 81.5 ± 10.4; 15-shot 81.7 ± 11.9. It was statistically significantly better than SSLTRANSRS and contrastive RGB approaches (SWAV, DINO, SECO), while SSL4EO (avg. rank 2.51), MOSAIKS (2.86), and a supervised BASELINE (2.99) were not significantly different from METEOR.
- Across heterogeneous 5-shot tasks (varying sensors/resolutions/classes): METEOR attained the best average rank (3.6) among methods. Dataset-wise accuracies for METEOR: Human Influence (AnthroProtect): 83.7; Crop type (DENETHOR): 75.6; Land cover DFC2020-Kippa-Ring: 87.7; EuroSAT: 60.9; Marine debris (floating objects): 90.8; Urban scenes (NWPU subset): 57.4. It was closely followed by SWAV (rank 4.2) and MOSAIKS (4.3). Significant superiority over BASELINE, PROTO, IMAGENET, and SCRATCH (Wilcoxon signed rank), with no single model dominating all datasets.
- Qualitative case studies demonstrated versatility: 1-shot land cover in Kippa-Ring averaged 68% (3 splits) using only 5 images; deforestation mapping in Roraima, Brazil (PlanetScope, 4 bands, 3 m) produced coarse 96×96 m no-forest probability maps aligning with visible clearing; urban scene classification in high-resolution RGB achieved 65% on 5 classes with occlusion sensitivity highlighting class-relevant structures; change detection in Beirut (Sentinel-2 time series) detected the post-event state with 84.5% probability on first post-explosion image and identified salient regions (crater/damage) via occlusion; marine debris segmentation (Sentinel-2) accurately delineated floating objects using only 5 annotated images.
Discussion
The findings support the hypothesis that a single meta-learned model can transfer across heterogeneous EO tasks and sensors with minimal labels. By addressing normalization pitfalls in meta-learning (using instance normalization), adapting input channels to variable spectral bands, and handling varying class counts through one-vs-all ensembling, METEOR learns transferable representations from global land cover tasks and rapidly adapts to new problems. Quantitatively, METEOR is among the top approaches within land cover and achieves the best overall rank across diverse tasks, indicating robust generalization across resolutions and application domains. Qualitative analyses show that fine-tuned task-models focus on semantically meaningful cues, and that the framework can support time-sensitive applications (e.g., deforestation/change detection) with few examples. The results highlight that models trained from a learning-from-tasks paradigm can provide a practical, efficient basis for a broad range of EO analyses where annotations are scarce.
Conclusion
METEOR introduces a practical meta-learning framework for EO that: (1) replaces transductive batch norm with instance normalization for robustness under class imbalance; (2) ensembles binary one-vs-all classifiers to flexibly handle varying numbers of classes; and (3) dynamically adapts first-layer filters to variable spectral bands. Pre-trained on global land cover tasks via MAML, a single meta-model adapts effectively to heterogeneous downstream problems, achieving competitive or leading performance across land cover, crop type, human influence detection, marine debris detection, and urban scene classification. METEOR thus reduces annotation needs and accelerates deployment across diverse EO applications. Future work includes broadening pre-training beyond land cover to additional labeled and unlabeled source tasks, improving scalability to many-class problems (beyond one-vs-all), and reducing meta-training memory requirements to enable larger backbones.
Limitations
- One-vs-all ensembling scales poorly with many classes, degrading performance on datasets with large class counts (e.g., full NWPU-RESISC45 with 45 classes), though performance remained acceptable up to 10 classes (EuroSAT).
- Meta-training with second-order MAML is memory-intensive compared to self-supervised pretraining; training larger backbones (e.g., ResNet-50/152) is currently impractical within this meta-learning setup.
- Pre-training tasks were limited to land cover; broader pre-training across tasks and modalities may further improve transfer.
- The approach relies on downstream spectral bands being a subset of the meta-model’s bands for direct filter transfer; truly novel bands/modalities not represented in pre-training may require additional strategies.
Related Publications
Explore these studies to deepen your understanding of the subject.

