logo
ResearchBunny Logo
Automatic detection of methane emissions in multispectral satellite imagery using a vision transformer

Earth Sciences

Automatic detection of methane emissions in multispectral satellite imagery using a vision transformer

B. Rouet-leduc and C. Hulbert

This groundbreaking research by Bertrand Rouet-Leduc and Claudia Hulbert unveils a powerful deep learning approach utilizing Sentinel-2 multispectral satellite data to detect methane emissions. With a revolutionary Vision Transformer architecture, this method dramatically enhances detection abilities, identifying methane sources as small as 0.01 km². Discover how this innovative model outperforms existing techniques and proves effective across diverse environments.

00:00
00:00
~3 min • Beginner • English
Introduction
Methane is the second largest contributor to global warming, responsible for roughly a third of observed warming. Its strong radiative forcing and short atmospheric lifetime make near-term mitigation impactful. Despite regulations and efforts, atmospheric methane continues to rise. Emissions can be intermittent or persistent, with a small fraction of large sources dominating totals; identifying and quantifying these is essential for mitigation and accurate inventories. Current approaches suffer from drawbacks: bottom-up inventories often underestimate emissions; ground, drone, and airborne measurements have limited coverage; hyperspectral satellites offer precise spectral information but at the cost of limited coverage or coarse spatial resolution, leading to poor quantification completeness. Satellite methane detection typically leverages SWIR absorption features. Hyperspectral missions retrieve column concentration (XCH4) but trade spectral resolution for either low coverage or low spatial resolution. In contrast, multispectral constellations like Sentinel-2 provide global coverage with high spatial and temporal resolution but limited spectral information, yielding noisy detections and, to date, sensitivity mainly to large emissions (≈2–3 t/h over bright deserts and >10 t/h elsewhere). To address these trade-offs, the authors develop a deep learning architecture tailored to open-source multispectral data to identify methane signatures and separate signal from noise, aiming to approach hyperspectral-like detection capability while retaining Sentinel-2’s coverage and resolution.
Literature Review
The paper situates its contribution within several strands of prior work: (1) Evidence that bottom-up inventories underestimate methane emissions and that a small number of point sources contribute disproportionately to totals. (2) Hyperspectral satellite retrievals of methane plumes enabling quantification of XCH4, but with trade-offs in spatial coverage or resolution (e.g., PRISMA target-mode; Sentinel-5 coarse resolution). Machine learning has been explored to refine hyperspectral retrievals. (3) Multispectral Sentinel-2 methods have relied on band ratios and multi-pass normalization (e.g., MBMP), enabling only detection of very large plumes (≈2–3 t/h on bright surfaces; >10 t/h in complex conditions) and often requiring manual masking/verification. (4) Emerging deep learning approaches for methane plume detection with hyperspectral data (e.g., PRISMA) and automated plume algorithms, indicating potential for data-driven improvements but previously constrained by limited training data and reliance on physically intensive simulations. The authors aim to extend robust, automated detection to multispectral data at much lower emission rates using large-scale synthetic training and transformer-based models.
Methodology
Data: The study compiles ~900 pairs of Sentinel-2A/B LIC Top-of-Atmosphere reflectance tiles (≈110×110 km each), filtered for <25% cloud cover and chosen across diverse climates/land covers while avoiding known methane source regions. All 13 bands are resampled to 20 m (native for B11/B12) using nearest neighbor resampling. Tiles are subdivided into 2.5×2.5 km scenes, yielding ~1,650,000 unique two-time-step samples (t−1, t). Input bands used: B1, B2, B3, B4, B5, B8, B8A, B9, B11, B12 at times t−1 and t. Synthetic plumes: To compensate for limited ground-truth in multispectral imagery, the authors generate ~20,000 synthetic methane plumes using analytical Gaussian plume models over a range of emission rates and wind speeds, adding 2D colored noise to mimic atmospheric turbulence. Plumes are embedded into B12 using the Beer-Lambert law at random locations within scenes. Half of the samples are augmented with plumes (positives), half remain unaltered (negatives). Train/val/test splits use disjoint geographic regions and disjoint plume sets to enhance generalization: approx. 75% training (Canada, Egypt, England, Ethiopia, France, India, Iran, Japan, Kenya, Mali, Mexico, Morocco, US, Saudi Arabia), 10% validation (south Argentina, Belgium, China), 15% testing (Afghanistan, north Argentina). Final databases: ~1,235,000 train, ~165,000 validation, ~245,000 test samples. Model architecture: A U-Net style encoder-decoder with a Vision Transformer (ViT, base variant, patch size 16) encoder and convolutional decoder with deconvolutional upsampling. The model is trained from scratch to perform sequence-to-sequence segmentation, outputting a plume mask for time t. Rationale includes transformers’ ability to capture long-range dependencies matching plume morphology and the use of two time steps to isolate transient absorption in B12 while leveraging other bands to suppress confounders. Training: Batches of 64 samples on 128×128 pixel crops (20 m resolution). Optimizer: Adam. Initial learning rate 1e-3, reduced by 0.1% upon plateauing validation performance (checked every 10 batches). Trained for 10 epochs; best-validation model retained. Baseline method: Multi-band multi-pass (MBMP) methane detection based on a normalized ratio involving B12 at two times; detection via thresholding the negative of the MBMP output (threshold 0.5), representing state-of-the-art rule-based Sentinel-2 approaches. Evaluation on synthetic data: Pixel-level F1 (harmonic mean of precision and recall) across signal-to-noise ratios (SNR defined as mean B12 reflectance reduction due to plume divided by B12 standard deviation in the scene). ROC curves are computed for both the deep model and MBMP. The deep model classifies pixels as plume if output >0.2 (threshold used consistently in figures). Evaluation on real plumes: The trained model is applied to the Carbon Mapper airborne catalog (AVIRIS-NG and GAO; 2526 plumes, ~8–9000 kg/h, median ~240 kg/h) by generating 2.5×2.5 km Sentinel-2 pairs centered on the cataloged centroid: a reference date within 3 months prior and a detection date within 7 days prior; only clear scenes (<0.5% cloud cover) retained, yielding 7724 candidate pairs. A detection is counted if ≥2 contiguous pixels exceed the model threshold within 500 m of the cataloged location. Randomized controls include application at random nearby locations/times to estimate false detection rates in operational contexts (including tests over Southern New Mexico away from the Permian). Examples on controlled release experiments (timed with Sentinel-2 overpasses) are assessed in supplementary figures. False positives are also assessed on unaltered test-set Sentinel-2 scenes without embedded plumes.
Key Findings
- Synthetic benchmarks: The deep model reliably detects methane down to ~5% SNR, achieving about an order-of-magnitude improvement over MBMP thresholding. ROC analyses show extremely low pixel-wise false positive rates are achievable (e.g., <0.03% FPR at ~85% TPR for the operating threshold used), while MBMP cannot achieve low FPR at low SNR. Most misses occur for very small, low-SNR plumes (<5%). - Real airborne catalog comparison: Across 7724 Sentinel-2 pairs for 2526 cataloged plumes, detection fraction depends more strongly on plume extent than cataloged rate, with a clear performance break at ~10,000 m2 plume area. Detection fractions down to ~200–300 kg/h approach the catalog’s mean leak persistence (20–26%), indicating sensitivity near the observational limit imposed by intermittency; detections drop sharply below this range, with smallest detected plumes around ~60 kg/h. - Operational robustness: Residual false detections are mainly due to clouds, rivers, and soil moisture changes; randomized-location/time tests quantify background detection/false positive rates. - Controlled releases: The model blindly detected all four controlled releases timed with Sentinel-2 overpasses, including a 1.1 t/h release missed by most participating groups. - Overall: The approach enables detection of emissions roughly an order of magnitude smaller than previous Sentinel-2 band-ratio methods, with near-global coverage at high spatial (20 m SWIR) and multi-day temporal resolution.
Discussion
The study addresses the central challenge of detecting methane emissions using multispectral satellites, which traditionally lack the spectral resolution to isolate methane absorption reliably at small plume scales. By training a transformer-based segmentation model on large-scale synthetic plumes embedded in real Sentinel-2 imagery and leveraging two acquisition times, the method disentangles transient methane signatures from background variability. This substantially reduces false positives and enhances sensitivity, approaching the practical limits set by leak intermittency. The results demonstrate that public, general-purpose multispectral satellites can approach the detection capabilities of specialized hyperspectral systems for point sources, enabling automated monitoring at global scale every few days. This has significant implications for building timely, fine-scale methane inventories, prioritizing mitigation, and complementing existing multi-tier monitoring frameworks (from coarse global to targeted observations). The low false positive rates are key to automation at scale, reducing the need for manual masking/verification that has constrained prior multispectral methods. Sensitivity dependence on plume extent underscores the importance of spatial plume development (and wind) alongside emission rate, informing integration with rate inversion techniques and environmental context in operational pipelines.
Conclusion
The paper introduces a deep learning framework (ViT-encoder U-Net) trained on synthetic Gaussian plumes embedded in real Sentinel-2 data to automatically detect methane plumes. It achieves about an order-of-magnitude improvement over state-of-the-art rule-based multispectral methods, with demonstrated sensitivity down to ~200–300 kg/h (occasionally as low as ~60 kg/h) and very low false positive rates, validated against extensive airborne detections and controlled release experiments. The approach paves the way for automated, high-frequency, high-resolution global monitoring of persistent methane point sources using widely available multispectral satellites. Future work includes: reducing residual false positives via model ensembles and auxiliary data (water, wind, cloud masks, background methane), integrating rate-quantification (e.g., IME or Gaussian plume fits), scaling cloud-based preprocessing and inference for global operations, extending to Landsat and upcoming Sentinel-2C/D for near-daily coverage, and further validation on controlled releases below 1100 kg/h.
Limitations
- Temporal mismatch and intermittency: Satellite acquisitions often occur on different days than airborne detections; typical leak persistence (~20–26%) limits the maximum achievable detection fraction, complicating direct sensitivity attribution. - Residual false positives: Remaining false detections are primarily associated with clouds, water bodies/rivers, and soil moisture changes. - Training realism: Synthetic Gaussian plumes (with colored noise) approximate, but do not fully capture, atmospheric/dispersion complexity; however, they enable large-scale training. - Scope: The model detects plume locations but does not estimate emission rates; additional inversion is needed for quantification. - Controlled releases: Demonstrations include releases ≥1.1 t/h timed with Sentinel-2; validation for smaller, timed releases (<1100 kg/h) is pending. - Generalization caveats: Although geographic splits improve generalization, performance may vary by surface type, illumination, and environmental conditions; fine-tuning by region may be beneficial.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny