
Computer Science
Quantum Algorithms applied to Satellite Mission Planning for Earth Observation
S. Rainjonneau, I. Tokarev, et al.
Discover how innovative quantum algorithms developed by the team at Terra Quantum AG can revolutionize satellite mission planning, achieving an unprecedented 98.5% success rate in high-priority task completion, far surpassing traditional methods.
~3 min • Beginner • English
Introduction
The paper addresses the satellite mission planning problem for Earth-observing imaging satellites, a computationally hard scheduling and routing task where satellites must sequence image acquisition requests within limited Data Take Opportunity (DTO) windows while accounting for maneuver times. The goal is to maximize the number of completed image requests, especially high-priority ones, under orbital and agility constraints. With the emergence of quantum technologies, the study investigates whether hybrid quantum-classical approaches can improve solution quality or efficiency over classical baselines. It focuses on practical, near-term quantum techniques (gate-based hybrid models and quantum-compatible optimization formulations) applied to real datasets, including scenarios with thousands of requests and multiple satellites orbiting along Earth’s terminator. The importance lies in improving utilization of costly satellite assets and enabling scalable planning for complex constellations.
Literature Review
Prior work applied classical optimization and heuristics to satellite scheduling, including simulated annealing, ant colony optimization, and other metaheuristics for Earth observation scheduling and daily photo selection. A quantum annealing approach for image acquisition planning was proposed, but reviews reported no clear practical advantage over classical methods. The present work explores gate-based hybrid quantum models within reinforcement learning and a quantum-friendly optimization formulation (QUBO-compatible integer model), aiming to demonstrate practical performance gains over strong classical baselines.
Methodology
Problem setup: Satellites orbit on the Earth’s terminator (~15 orbits/day). Each request has a DTO time window and requires a continuous acquisition slot while the satellite points to the target. Rotations are limited to 1 degree per second, and relaying between targets adds maneuver time, reducing agility. Two datasets are used: a single-satellite set with 462 requests and a two-satellite set with 2000 requests. Performance is evaluated primarily on highest-priority (π1) requests, with completion rate as the main metric.
Data formatting: Inputs include (i) satellite ephemerides (orbit number, timestamp, position/velocity in ECI), and (ii) requests (ID, priority 1–4, DTO start/end times, median line start/end coordinates, satellite ID, completion flags). Satellite ECI coordinates and request geolocations are used to compute pointing angles and durations.
Priority ordering: Objectives prioritize completing all higher-priority requests before considering lower ones; evaluation focuses on π1 completion for comparability.
Relaying algorithm: Transition time between requests is computed using the angle between satellite–target vectors at the final median point of the first request and the initial median point of the second request. Assuming constant 1°/s rotation and spherical Earth, the angle (degrees) approximates relaying time (seconds).
Solution chaining: Requests are plotted on a time axis by DTO windows; feasible acquisitions are chained by ensuring available time for relaying plus acquisition between consecutive requests. This links independent DTO windows into an executable sequence. To manage complexity, clustering is used.
Clustering: K-means clustering (k determined empirically) groups requests using DTO start/end times and median line coordinates, creating geographically and temporally coherent clusters that reduce problem size and facilitate efficient relaying. Simpler bunching heuristics (DTO overlap, priority) are also described in the appendix.
Greedy baseline: Within clusters and per satellite, a simple time-incrementing greedy algorithm executes the highest-priority available request if enough remaining time exists; otherwise it skips. It yields a fast reference but limited foresight, especially as problem complexity grows.
Integer optimization (linear model, QUBO-adaptable): An integer programming formulation selects request orders, start angles, and successions to maximize a weighted objective favoring higher-priority completions and earlier starts, subject to constraints: each order slot holds at most one request; requests are executed at most once; consistency between chosen start angles and feasible successions; no gaps in the execution queue until stop; feasible maneuver mappings between requests using precomputed minimal relaying times (via Orekit-based functions). The cost function balances completion and time-penalization terms; higher-priority requests have weights exceeding any sum of lower-priority weights. The model is solved with a branch-and-cut solver (Gurobi). Clusters are solved independently and linked by adjusting DTO starts in subsequent clusters if relaying from the previous cluster is infeasible. The model is compatible with QUBO/Ising encoding and quantum annealing in principle, though current devices struggle with larger instances; linear programming solves large clusters efficiently in practice.
Reinforcement learning (RL): Two environments were designed: (1) satellite-centered, where the agent, per satellite, observes up to the 100 nearest (by DTO open time) incomplete requests with features plus current time and last completion location, choosing which request to execute; completion yields reward 1; and (2) request-centered, where the agent selects the best satellite for each request using nearest options and minimum execution time under chaining. PPO is employed as the RL algorithm.
Hybrid-quantum models: A hybrid quantum-classical policy network is implemented by feeding classical neural outputs into a 4-qubit parametrized quantum circuit (PQC) with variational layers (Pauli-X rotations, ring CNOT entanglement), parallel Pauli-Z data encoding, and Z-basis measurements to produce classical outputs. This hybrid PPO shows faster learning and higher reward than a similarly sized classical network. A hybrid AlphaZero is developed using Monte Carlo Tree Search (MCTS) for selection/expansion/simulation/backpropagation, coupled with an encoding network and a PQC-based policy head (plus a value head). The MCTS environment simulates actions, and an outer training loop optimizes policy and value losses over batches. Training was performed on the QMware quantum cloud.
Key Findings
- On the 2-satellite/2000-request benchmark, highest-priority (π1) completion rates were: Greedy baseline 75.8%; Greedy + K-means 63.6%; Integer optimization 98.1%; Hybrid AlphaZero 98.5% (best).
- The hybrid quantum-enhanced PPO policy achieved higher rewards in far fewer steps than a classical network of similar complexity (e.g., hybrid surpassed classical performance by ~8k steps vs >110k for classical in the shown runs), indicating improved sample efficiency.
- Linear programming (integer optimization) solved the 2-satellite/2000-request dataset in under 3 minutes when decomposed into suitably small clusters, approaching greedy runtime while vastly improving optimality.
- On the single-satellite dataset, the optimization approach achieved 100% completion for π1–π3 and 96.2% for π4 in about 6 minutes.
- Clustering (K-means) assists runtime/scalability but naive clustering can degrade greedy accuracy (π1 dropping from 75.8% to 63.6% on the large dataset).
Discussion
The study demonstrates that satellite mission planning benefits significantly from sophisticated optimization and reinforcement learning approaches compared to a simple greedy baseline. By explicitly modeling maneuver times, DTO window constraints, and execution order, the integer optimization model finds near-optimal schedules with high π1 completion, while remaining computationally tractable via clustering and chaining. Hybrid quantum-enhanced RL, particularly the hybrid AlphaZero with a PQC policy, further improves solution quality, achieving the best π1 completion rate. These results address the central question of whether quantum-inspired or hybrid quantum-classical methods can yield practical advantage in complex scheduling: hybrid RL accelerates learning and improves performance, while the optimization model provides strong, fast solutions using classical solvers and remains compatible with future quantum hardware. Together, they provide a pathway to scalable, high-quality mission planning for multi-satellite systems.
Conclusion
This work contributes two high-performing approaches to Earth observation satellite mission planning: (1) a quantum-compatible integer optimization model that yields near-optimal plans rapidly when combined with clustering and chaining, and (2) a hybrid quantum-enhanced RL framework—culminating in a hybrid AlphaZero—that surpasses classical baselines on a challenging multi-satellite, large-request dataset. The best reported π1 completion rates on the 2-satellite/2000-request dataset are 98.5% (hybrid AlphaZero) and 98.1% (integer optimization), well above the greedy baseline. These findings indicate that hybrid quantum-classical algorithms are promising for real-world space mission planning and similar scheduling problems.
Potential future directions include: scaling to larger constellations and longer planning horizons; integrating additional operational constraints (e.g., onboard storage, downlink scheduling, energy/thermal limits); advancing QUBO/Ising mappings for improved performance on quantum annealers or future fault-tolerant devices; and developing real-time replanning with dynamic task arrivals and disturbances.
Limitations
- The greedy baseline has limited foresight and struggles with complex, overlapping DTO windows; clustering can further reduce its accuracy.
- The optimization model’s QUBO form is challenging for current quantum and hybrid solvers: even small clusters can be hard for quantum annealers or QUBO solvers within short time limits, whereas linear programming scales better at present.
- Some simplifying physical assumptions are used (e.g., constant rotation rate of 1°/s and spherical Earth) that may not capture all spacecraft dynamics.
- The RL environments include engineered observation spaces (e.g., selecting nearest requests) and, in the satellite-centered environment, some data were artificially generated; generalization to broader real-world distributions requires further validation.
- Evaluation emphasizes highest-priority (π1) completion; performance on lower priorities or multi-objective trade-offs (e.g., energy, storage) is less explored.
Related Publications
Explore these studies to deepen your understanding of the subject.