
Health and Fitness
Cloud-based applications for accessing satellite Earth observations to support malaria early warning
M. C. Wimberly, D. M. Nekorchuk, et al.
Explore how climate variables like temperature and precipitation drive malaria epidemics! This research by Michael C. Wimberly, Dawn M. Nekorchuk, and Ramcharan R. Kankanala introduces REACH, an innovative cloud-based application leveraging satellite data to enhance early warning systems in Ethiopia, with potential for global application.
~3 min • Beginner • English
Introduction
The study addresses the need to monitor climate-related environmental risk factors that influence malaria transmission, such as temperature, precipitation, vegetation moisture, and soil moisture. With climate change intensifying heat extremes and altering rainfall patterns, there is growing interest in leveraging Earth-observing satellites to provide timely, spatially comprehensive data for public health applications. However, many researchers and practitioners, especially in low- and middle-income countries, face barriers including limited expertise with satellite data, insufficient computational resources, and low-bandwidth internet access. Cloud computing platforms like Google Earth Engine (GEE) offer a solution by enabling users to access, process, and summarize large geospatial datasets without downloading raw imagery. The objective of this work was to support the EPIDEMIA malaria early warning project in Ethiopia by developing REACH, a cloud-based application that automates acquisition, processing, and woreda-level summarization of satellite-derived environmental indicators for use in malaria forecasting.
Literature Review
The paper synthesizes prior evidence linking climate variability to human health, including associations between heat waves, rainfall variability, flooding, drought, and disease risks. It highlights extensive use of satellite remote sensing for public health, including monitoring vegetation, temperature, precipitation, air pollution, heat waves, and population distribution. The authors note growing adoption of GEE for large-scale geospatial analysis and specifically for public health applications. They identify persistent barriers to environmental data access and processing in LMIC contexts, based on workshops, interviews, and a survey of Ethiopian malaria professionals, where most respondents cited environmental data access as a moderate to major barrier. Earlier EPIDEMIA efforts used client-side processing, which proved impractical in low-bandwidth settings, motivating a shift to a cloud-based approach leveraging GEE.
Methodology
Overview: REACH generates daily summaries of satellite-derived environmental variables for all Ethiopian woredas to support malaria early warning. The workflow acquires, filters, harmonizes, and summarizes data from MODIS and GPM/IMERG within GEE, exporting results as CSV tables suitable for ingestion by EPIDEMIA.
Data sources and derived variables:
- MODIS Terra 8-day Land Surface Temperature and Emissivity (MOD11A2): daytime LST, nighttime LST, and mean LST (°C).
- MODIS BRDF-adjusted surface reflectance (MCD43B3): spectral indices including NDVI, SAVI, EVI, NDWI5 (using SWIR band 5, 1230–1250 nm), and NDWI6 (using SWIR band 6, 1628–1652 nm).
- GPM IMERG v6 (combined Early/Late/Final runs in GEE): total precipitation (mm) at 30-minute intervals aggregated to daily totals.
Quality control and masking:
- MOD11A2 LST: Extract day/night LST with associated QA. Pixels with QA ≥ 2 in either daytime or nighttime images were set to missing (masked). LST 8-day means were converted to daily values by assuming constant temperature within each 8-day composite period; daily mean LST computed as mean of day and night.
- MCD43B3 reflectance: Daily surface reflectance at 500 m with QA. Pixels with NIR band QA ≥ 2 were masked; permanent water bodies were also masked. Spectral indices NDVI, EVI, SAVI, NDWI5, and NDWI6 were computed from surface reflectance bands.
- IMERG precipitation: GEE asset that ingests Early and Late runs with low latency, replaced by Final run (with gauge correction) when available. 30-minute, 0.1° data aggregated to daily totals; resampled to 1000 m to ensure at least one grid cell per woreda for zonal summaries.
Spatial summarization:
- Administrative boundaries: Harmonized woreda boundaries (2019 baseline with historical reconciliation) uploaded to GEE as a public asset. Zonal statistics computed as daily means of valid pixels per woreda for all variables.
- For MODIS variables, outputs include counts of cloud-free pixels used and total pixels per woreda-day; if no cloud-free pixels existed, no value returned.
Temporal harmonization and outputs:
- All variables expressed at daily resolution (LST pseudo-daily from 8-day composites; spectral indices daily; precipitation aggregated daily). Outputs formatted as CSV tables with one row per date-by-woreda, including identifiers (woreda, zone, region, year, day-of-year) and variable columns. Designed for automated ingestion by EPIDEMIA.
Implementations:
1) GEE JavaScript API in the Code Editor: Web-based UI for date selection, visualization, and triggering export tasks to Google Drive. Requires GEE account; supports larger requests; allows code modification for new geographies or datasets.
2) GEE Earth Engine App: Public URL without login; similar UI for date selection and visualization; direct downloads from browser; suitable for recent weeks to months due to timeouts on long, multi-year requests.
3) Python API package: Function gee_to_drive(start_date, end_date) to automate acquisition, processing, and export to Google Drive after authentication. Callable from Python or R (via reticulate) and schedulable for routine updates.
Performance and data volume:
- Cloud-based processing reduces local storage and bandwidth demands. Example: ~3.8 TB of raw data to compute one year of daily environmental indices for Ethiopia vs 88 MB for the corresponding woreda-level CSV summaries (>43,000× reduction). A 20-year archive (2002–2021) of daily woreda-level summaries was generated.
Integration with EPIDEMIA:
- EPIDEMIA ingests CSVs, resolves duplicates, fills gaps, and links to malaria surveillance for model calibration and 12-week-ahead forecasting. Reports include timeseries and maps of incidence and climate anomalies.
Key Findings
- REACH enables operational access to satellite-derived environmental indicators (LST, vegetation/water indices, precipitation) summarized daily for all Ethiopian woredas, supporting malaria forecasting.
- Data volume reduction: Converting raw imagery (~3.8 TB per year for Ethiopia) to woreda-level CSV summaries (~88 MB per year) reduces download/storage by a factor >43,000, facilitating use in low-bandwidth settings.
- Historical archive: A 20-year (2002–2021) daily archive of woreda-level environmental summaries was produced and shared.
- Operational feasibility: In the Amhara pilot (2019–2020), users at Bahir Dar University and the Amhara Regional Health Bureau regularly obtained and integrated updates; end-to-end forecasting workflow could be completed within ~1 hour.
- Stakeholder needs: Among 22 Ethiopian malaria professionals surveyed, 12 rated environmental data access as a major barrier, 11 as moderate, and 1 as not a barrier, underscoring the importance of REACH.
- Implementation trade-offs: JavaScript API allows flexible, large requests and code customization but requires a GEE account; Earth Engine App is simplest for recent data but times out for multi-year requests; Python API supports automation and integration with R-based EPIDEMIA.
- Data quality handling: MODIS cloud contamination is common (especially June–August); REACH masks low-quality pixels and reports counts of valid pixels used per woreda-day; missing values are subsequently screened and imputed within EPIDEMIA.
Discussion
The findings demonstrate that a cloud-based architecture using GEE can overcome bandwidth, storage, and computational constraints that hinder routine use of satellite data for malaria early warning in Ethiopia. By automating QA filtering, temporal harmonization, and zonal summarization, REACH delivers compact, analysis-ready datasets suitable for integration with surveillance systems. The successful pilot in Amhara shows that public health partners can reliably access and apply environmental summaries to generate weekly forecasts within operational time constraints. Comparing implementations clarifies fit-for-purpose choices: the Code Editor (JavaScript API) supports power users and large historical backfills; the Earth Engine App lowers barriers for routine, small-range updates; and the Python API enables automated pipelines. Handling of cloud-induced missing data at the summary stage, coupled with downstream imputation in EPIDEMIA, provides a pragmatic balance between data quality and processing complexity. Overall, REACH addresses a central barrier to scaling malaria early warning by providing timely, national-scale environmental data access compatible with existing modeling workflows.
Conclusion
This work presents REACH, a GEE-based application that streamlines access to daily, woreda-level environmental indicators for malaria early warning in Ethiopia. It demonstrates substantial reductions in data volume, operational feasibility in low-bandwidth contexts, and successful integration with EPIDEMIA to support 12-week forecasts. The software is available via a simple web app, a flexible JavaScript implementation, and an automatable Python package. Future directions include extending REACH to other geographies, integrating additional environmental datasets available in GEE, exploring alternative cloud platforms (e.g., Azure, AWS) as complementary options, and enhancing automation and robustness for national-scale operations.
Limitations
- Cloud contamination in optical/thermal MODIS data leads to missing observations, especially during the rainy season; while REACH reports valid pixel counts and omits fully cloudy days per woreda, this can create gaps requiring downstream imputation.
- Earth Engine App can time out on large, multi-year requests, limiting its use to recent weeks to months; historical backfills are better handled via the JavaScript or Python APIs.
- The JavaScript API requires a GEE account and familiarity with the Code Editor; modifying the code base to add datasets or regions requires programming expertise.
- Automation via the Python API adds system complexity (authentication management, job scheduling, monitoring) that may challenge some operational settings.
- LST is a proxy for near-surface air temperature with context-dependent relationships influenced by land cover and meteorological conditions; interpretations should consider these nuances.
- IMERG Final Run has a latency of months; near-real-time analyses rely on Early/Late Runs that are later revised, potentially affecting retrospective consistency.
Related Publications
Explore these studies to deepen your understanding of the subject.