Space Sciences
Reinforcement Learned Adversarial Agent (ReLAA) for Active Fault Detection and Prediction in Space Habitats
M. Overlin, S. Iannucci, et al.
Space tourism is growing with plans for sub-orbital leisure flights and short-duration crewed missions, increasing the need for accurate system health monitoring (SHM) to ensure safety and reliability. Conventional SHM approaches are often passive, relying on fixed thresholds and failing to capture dynamic relationships and coupled or cascading faults. Some updatable, purely data-driven approaches exist but can lack representative fault data and may be difficult to train when failures are rare and hardware is expensive. Physics-based or hybrid physics/data-driven models offer advantages in explainability, generalizability, and interpretability, which are important for life-sustaining systems. Digital twins integrating physics-based models have been effective when large supervised datasets are unavailable. In this work, reinforcement learning adversarial agents (ReLAAs) are developed to recognize and predict faults by learning actions that would induce faults, but actions are executed on a digital twin rather than on hardware. Neuroevolution is used to train the artificial neural network policies, enabling learning of parameters and promoting diversity beyond gradient-based training. Because genetic algorithms can collapse to single solutions, diversity-promoting methods (clustering and fitness sharing) are employed. The study introduces a framework for active fault detection and prediction on a physical mock space habitat, outlines the hardware, middleware, digital twin, fault elicitation, and reinforcement learning methods, presents experimental and agent deployment results, and discusses implications.
Physical demonstration system: A mock space habitat was designed and instrumented as a system-of-systems comprising a thermal control system (TCS), a grey water filtering system (GWS), and an electrical integration model. The habitat volume was ~28 m³ with ~9.3 m³ per occupant (three occupants assumed), guiding GWS sizing and load estimates. Expected waste water processing was 21 L/h with ~250 W for GWS, ~560 W for TCS, and a 200 W load bank to emulate other Environmental Control and Life Support Systems (ECLSS) loads, with a 1 kW DC power supply sized to ensure that not all systems can be powered simultaneously, enabling cascading fault scenarios.
Thermal Control System (TCS): Implemented as a single closed-loop water system (simplified relative to spacecraft practice). Key components included pump, heat exchanger, heater, and chiller; extensive instrumentation (temperature, pressure, flow) provided observability to regulate room temperature and capture fault signatures.
Grey Water Filtering System (GWS): Based on prior NASA designs. Water flows through forward osmosis (FO) to transfer water from feed to draw solution via a semi-permeable membrane, yielding a saline draw solution. Reverse osmosis (RO) then uses hydraulic pressure to produce potable water from the draw solution, returning reject flow to the draw tank (approx. 2:1 reject:product). Sensors include pressure, flow, total dissolved solids, power consumption, and tank levels.
Instrumentation, data acquisition, and control: A Raspberry Pi 4b with NASA’s core Flight System middleware ingests sensor data, processes raw voltages, records histories, provides remote control and data synchronization, and communicates with two Arduino Mega microcontrollers (one for sensing, one for actuator control).
Digital twin development: An integrated physics-based model (TCS, GWS, electrical) was developed in Simscape and compiled as a Functional Mock-up Unit (FMU) via the FMI standard for use by the ReLAA. Model parameters were tuned against experimental data, achieving mean absolute deviation ≤7% between simulated and measured waveforms while prioritizing faster-than-real-time execution (real-time factor ~20:1). The FMU enables offline recognition of faults from historical data or online forward-looking prediction from current states. Due to FMU and sensor limitations, a database of historical internal states was created; a Ball Tree nearest-neighbor search maps current sensor states to the closest internal FMU state to initialize simulations.
Fault elicitation and design of experiments: A failure modes and effects analysis (FMEA) identified faults and controllable mechanisms for elicitation. Examples include: GWS membrane fouling/clogging/restrictions/leaks via valves; TCS freezing via chiller/heater and blockages/leaks via valves; electrical power spikes via variable resistors, shorts via circuit breaker, power loss via DC supply; sensor failures via power disconnect/software logic and drift via variable resistors/software; miscellaneous load spikes via load bank. Faults were exercised individually and in combinations to study cascades and establish baselines for training and validation.
Reinforcement learning with neuroevolution: The adversarial agent is a feed-forward neural network mapping state (sensor measurements; 50 inputs) to action probabilities (SoftMax over 14 possible fault/actuation options). Outputs represent probabilities from which actions are sampled (assuming a normal distribution for selection), and issued to the FMU (or hardware during benign experiments). Agents act over limited time horizons to discover imminent failure cases. Training uses a genetic algorithm with clustering and fitness sharing to maintain diversity. Distance between policies is computed using KL divergence over a set of recent states; fitness is scaled inversely by the density of neighbors within a divergence threshold δ to discourage convergence on similar solutions. Reward design: For each feature, a per-feature reward rv is 1 if the measurement violates upper or lower bounds, otherwise decreases toward zero inside bounds based on distance from the mid-point; state reward r is the sum over features of (1 − rv)/100; agent fitness Fi is the cumulative sum of state rewards over a simulation run. Higher fitness corresponds to more successful fault induction and identification. Parallelized training: The digital twin runs at ~20× real-time; each training run executes on a standalone CPU process with up to 24 concurrent branches, yielding up to ~18× speedup over serial training and broadening state-space exploration.
Training protocol: Each generation evaluates 36 agents. Agents are clustered (goal of six clusters), and the top two performers per cluster are selected as parents (12 parents). Offspring inherit parameters with Gaussian noise (parameters constrained to [−1,1], maximum variability 0.25, standard deviation 0.10). Performance per agent is the cumulative reward across 10 test runs per generation.
- Experimental campaigns: 20 experiments were conducted with multiple disturbances to emulate normal and faulted operations. TCS experiments showed room temperature deviations under induced faults (e.g., pump blockage, reduced chiller flow, bypass valve toggling). GWS pressures deviated under clogs, leaks, or component damage compared to healthy baselines.
- Digital twin accuracy and speed: The integrated model matched experimental waveforms with mean absolute deviation ≤7% while achieving ~20× faster-than-real-time simulation, enabling efficient training and prediction.
- Training performance: Each generation evaluated 36 agents across six clusters, selecting 12 parents. Parameter noise had max variability 0.25 (σ=0.10). Over 100 generations, top-agent fitness improved steadily, peaking around epoch 65, then plateauing as agents converged to near-optimal strategies. Cluster diversity (average KL divergence) increased by a factor of ~20 during the first 20 generations, then stabilized as cluster fitness improved.
- Fault induction and prediction: On simulation-derived testing data, two trained agents successfully forced system-level faults via different actuation strategies under an RO clog disturbance (actuated at 0.5% per second to 30% over-fault). Agent 1 actuated the RO clog in the GWS, increasing pump power draw and triggering a cascade: DC bus voltage dropped, the TCS pump underperformed, TCS flow reduced, and room temperature regulation failed—demonstrating an indirect cross-subsystem fault path. Agent 2 actuated a TCS intake strainer clog, producing a gradual pressure drop and reduced flow; while not the single optimal failure route alone, in combination with other minor issues it contributed to system fault, illustrating the framework’s capacity to identify multi-cause latent failures.
- Parallelization: Running 24 concurrent simulations provided up to ~18× training speedup, enabling broader exploration of operational scenarios and diverse fault generation events.
The study demonstrates that reinforcement-learned adversarial agents, trained on a validated physics-based digital twin, can actively identify actions leading to both direct and latent, cross-subsystem faults in a mock space habitat. By learning to perturb the system within a simulated environment, ReLAAs reveal vulnerabilities and cascading failure pathways that passive threshold-based SHM would miss. The neuroevolutionary approach with clustering and fitness sharing successfully prevented collapse to a single solution, maintaining diversity and enabling multiple, distinct failure-inducing strategies. Improvements in agent fitness across generations and stabilized yet elevated inter-cluster divergence indicate effective exploration-exploitation balance. The ability to predict and explain indirect failure chains (e.g., GWS-induced electrical load leading to degraded TCS performance) underscores the relevance of physics-based digital twins for interpretability and scenario analysis in life-support systems. Limitations in simulation runtime and data volume constrain the extent of training and deployment databases, but faster-than-real-time execution still permits forward-look predictions and supports practical deployment strategies for online monitoring and early fault prediction.
This work proposes and demonstrates an active fault detection and prediction framework for space habitats using reinforcement-learned adversarial agents coupled to a validated, faster-than-real-time digital twin. A physical mock habitat (TCS, GWS, electrical) was designed, instrumented, and exercised under healthy and faulted conditions; an FMEA and design of experiments guided fault elicitation. The integrated Simscape model compiled as an FMU achieved acceptable accuracy (≤7% mean absolute deviation) and enabled safe training and testing. Multiple ReLAAs trained via neuroevolution with diversity preservation identified diverse damaging actions, including indirect, cascading faults across subsystems. These insights can inform operators to mitigate or prevent faults before they occur. Data and code are available upon request. Future work can focus on improving simulation efficiency and scalability, expanding the action/state spaces, and integrating additional ECLSS subsystems to capture more complex fault interactions.
- Simulation runtime and data volume: Each FMU-based simulation takes minutes and produces large datasets, limiting total training iterations and the size of the deployment database.
- Model fidelity vs. speed tradeoff: The digital twin prioritized execution speed over maximum accuracy (albeit with ≤7% mean absolute deviation), which may affect fine-grained fault dynamics.
- FMU/state initialization constraints: Due to FMU and sensor limitations, initializing arbitrary complex internal states required nearest-neighbor retrieval (Ball Tree) from a database of stored states, potentially introducing approximation errors.
- Generalizability to other habitats: While physics-based modeling supports transfer, validation on additional configurations and subsystems would be required to ensure broad applicability.
Related Publications
Explore these studies to deepen your understanding of the subject.

