
Physics
Nanosecond anomaly detection with decision trees and real-time application to exotic Higgs decays
S. T. Roche, Q. Bayer, et al.
Explore how a team of innovative researchers, including S. T. Roche and Q. Bayer, developed an autoencoding algorithm using deep decision trees on FPGA for rapid anomaly detection at the LHC. This groundbreaking system not only detects rare Higgs boson decays with incredible speed but also operates efficiently in resource-limited environments, paving the way for advanced applications in edge AI.
~3 min • Beginner • English
Introduction
The study addresses the challenge of identifying rare beyond-the-Standard-Model (BSM) signals in real time at the LHC trigger level, where bandwidth and latency constraints limit sensitivity—particularly for exotic Higgs decays H → aa with a light pseudoscalar (m_a ≈ 10 GeV) decaying to γγ and another decaying to jj (H_aa → γγjj). Conventional threshold-based diphoton triggers miss significant portions of this phase space (m_γγ < 20 GeV). The research proposes an interpretable, low-latency autoencoder based on deep decision trees implemented on FPGA to detect anomalies indicative of BSM processes without prior signal-specific training. The goal is to increase signal acceptance at fixed background trigger rates and to demonstrate feasibility for real-time deployment with nanosecond-scale inference, benefiting LHC experiments and other edge-AI applications with resource constraints.
Literature Review
The paper situates its work within extensive LHC anomaly-detection literature, noting that most prior approaches rely on neural networks (e.g., VAEs, DNNs, flows, graph NNs) trained off-line and evaluated on stored data, with emerging results in ATLAS and CMS analyses. Prior FPGA implementations of neural network-based autoencoders achieved latencies from ~80 to 1480 ns (Govorkova et al., Nat. Mach. Intell. 2022). Tree-based and forest autoencoders exist (e.g., AutoEncoder by Forest; autoencoder trees), but may be less suited for direct FPGA deployment without specialized design. The authors build upon their prior FPGA work with deep decision trees (fwXmachina) that utilize threshold comparisons for efficient, interpretable, and low-latency implementation. They also compare performance on a public LHC physics dataset to the neural network FPGA baseline, showing comparable AUCs while reducing latency and resource use in some categories.
Methodology
Architecture and anomaly score:
- The autoencoder (AE) uses a forest of deep decision trees (T trees, each with max depth D). Each tree partitions the V-dimensional input space into hyperrectangular bins (P_b). For each bin, the decoded output x* is the component-wise median of training data in that bin, minimizing L1 distance for SM-like inputs.
- A coder (star-coder) executes encoding and decoding simultaneously, bypassing an explicit latent vector; functionally x* = x_hat(x) through bin lookup per tree. The per-tree L1 distance between input and reconstructed output is computed and summed over trees to yield the anomaly score Δ.
- Inputs are scaled to physical ranges (e.g., angles [0, 2π), momenta up to kinematic endpoints) and converted to N-bit integers for firmware.
ML training (Decision Tree Grid, DTG):
- Recursive, importance-sampled splitting: for current sample s at depth d, build PDFs of bit-integer input variables, sample the max-weighted variable distribution to choose a split variable, then sample that variable’s PDF to choose a threshold c and split on x_i < c. Continue until depth limit D or minimum sample fraction f reached, recording cuts to form the Decision Tree Grid (DTG).
- Weighted randomness in variable and threshold selection builds a non-identical forest for aggregate accuracy; no boosting weights are used (one-sample training on SM only).
- Compression via latent dimension T is not required; performance arises from density estimation. For the benchmark, T/V ~ 4 (no compression); for the LHC physics dataset, T/V ~ 0.5 (compressed).
Datasets and simulation:
- Benchmark (γγjj): Training uses 500k simulated SM events producing γγjj at 13 TeV (cocktail weighted by cross sections). Testing uses 500k SM events plus 100k signal events each for H_125 → a_10 a_70 → γγjj and cross-check H_70 → a_5 a_50 → γγjj. Generation with MadGraph5_aMC 2.9.5 (LO), showers/decays with Pythia8, detector simulation and reconstruction with Delphes 3.5.0 (CMS card).
- Inputs: eight features from two leading photons (γ1, γ2) and two leading jets (j1, j2): pT(γ1,γ2,j1,j2), ΔR_γγ, ΔR_jj, m_γγ, m_jj. Photons reconstructed with pT > 0.5 GeV, jets anti-kT (R=0.4), pT > 20 GeV.
- Preselection for AE training/evaluation: events with ≥2 photons and ≥2 jets and with m_γγ < 20 GeV.
Trigger baselines and operating points:
- ATLAS-inspired diphoton trigger approximation: require pT ≥ 25 GeV for each of the two reconstructed photons to emulate thresholds fully efficient at first-level trigger and high-level trigger conditions (worst-case background composition assumed from total diphoton 3 kHz rate).
Training configurations:
- Benchmark γγjj AE: forest of T=30 trees, max depth D=6, 8-bit inputs for firmware emulation.
- LHC physics dataset: public dataset with 56 features (jets, electrons, muons kinematics plus missing energy); preselection requires one lepton (e or μ) with pT > 23 GeV and |η| within specified ranges. Train AE with T=30, D=4 on SM cocktail and evaluate on BSM signals (LQ_50→bτ, A_50→4τ, h_60→ττ, h_60→γγ). Cross-check training with only 26 features as a reduced-input variant.
Firmware implementation and validation:
- Implemented on Xilinx Virtex UltraScale+ xcvu9p (VCU118), 200 MHz clock. Inputs converted to 8-bit ap_int. AUTOENCODER PROCESSOR comprises multiple Deep Decision Tree Engines (DDTE) feeding a Distance Processor computing L1 distances and their sum.
- Latency and resource use measured; verified via C simulation, RTL co-simulation, and hardware tests using Xilinx ILA. Synthesized using Vivado HLS (and Vitis HLS for larger designs).
Key Findings
- Real-time performance (benchmark γγjj configuration on xcvu9p, 200 MHz): latency 30 ns (6 ticks), initiation interval 5 ns (1 tick); resource usage approximately 0% BRAM/URAM, 0.1% DSP (8), 0.6% FF (~15k), 5.4% LUT (~63k). Earlier summary also reported about 7% LUT and ~1% FF with negligible DSP.
- Benchmark trigger acceptance at fixed SM rate (3 kHz):
• ATLAS-inspired diphoton trigger: SM acceptance 0.31%; H_125 signal 2.2%; H_70 signal 0.01%.
• Autoencoder trigger (after preselection m_γγ < 20 GeV; events passing preselection: SM 38%, H_125 53%, H_70 29%): at same 3 kHz SM rate, H_125 acceptance 6.1% (~3× diphoton) and H_70 acceptance 1.4% (>> 0.01%).
- LHC physics dataset (56 variables, T=30, D=4): anomaly detection comparable to neural network FPGA baseline (DNN VAE PTQ 8-bit). Reported AUCs (ours vs baseline in parentheses): LQ_50→bτ: 0.93 (0.927); A_50→4τ: 0.93 (0.947); h_60→ττ: 0.85 (0.817); h_60→γγ: 0.94 (0.947). FPGA cost: latency 30 ns (6 ticks), interval 5 ns, with lower DSP and BRAM usage than prior work and comparable or lower FF/LUT.
- Reduced-input cross-check (26 features): AUC within ~1% of 56-feature result; latency ~35 ns (7 ticks), interval 5 ns; resources substantially reduced (~9k FF, 61k LUT, 26 DSP, no BRAM). Similar costs also achieved on a smaller FPGA (Xilinx Zynq UltraScale+ xczu7ev).
- Robustness to signal contamination during training: performance degrades gracefully as contamination increases from 1% up to 33%, yet even at 33% contamination the AE outperforms the ATLAS-inspired diphoton trigger by about a factor of two in H_125 acceptance at the same SM rate.
- Interpretability: tree cuts and decision paths enable visual inspection of selections; aids understanding of trigger behavior and potential disentangling of detector effects from genuine BSM anomalies.
Discussion
The decision tree-based autoencoder achieves low-latency, resource-efficient anomaly detection directly compatible with FPGA-based first-level trigger systems. Training on SM-only data enables model-agnostic sensitivity to rare BSM processes such as exotic Higgs decays with light pseudoscalars, addressing the trigger bottleneck in the m_γγ < 20 GeV region. At fixed SM trigger rates, the AE substantially increases signal acceptance compared to a conventional diphoton trigger approximation, demonstrating practical gains for real-time data collection. On a broader LHC physics dataset, performance is comparable to neural network baselines while improving latency and reducing certain FPGA resource needs, enhancing deployability on resource-constrained platforms. The approach is interpretable, facilitating operational diagnostics and trust in trigger decisions. Training with contaminated data remains viable, supporting adaptive strategies where autoencoders are periodically retrained on collected data and then deployed on future streams. The findings indicate that decision-tree AEs can serve as robust, ultra-fast anomaly detectors for edge AI applications in HEP and beyond, particularly within forthcoming HL-LHC trigger environments.
Conclusion
This work introduces and deploys an interpretable, deep decision tree-based autoencoder for real-time anomaly detection on FPGA, achieving 30 ns latency with low resource usage. Trained on SM-only samples, it significantly enhances acceptance for exotic Higgs decays H → aa → γγjj at fixed background rates and performs comparably to neural network approaches on a public LHC dataset, while offering improved latency and interpretability. The method is robust to realistic levels of signal contamination during training and is validated through simulation, RTL co-simulation, and on-hardware tests. Future directions include extending to raw detector inputs (from thousands to hundreds of millions of channels), exploring alternative variable sets to mitigate coordinate-dependence of anomaly scores, integrating control-region and latent-space analyses to isolate BSM contributions, and deploying adaptive training pipelines that leverage collected data under HL-LHC conditions.
Limitations
- Dependence on preprocessed physics objects (photons, jets, leptons); current design assumes availability of reconstructed objects at trigger level, not raw detector channels.
- Anomaly score depends on variable choice and coordinate transformations (non-unique mapping due to Jacobians), potentially affecting selection stability; mitigation may require orthogonal variables or latent-space analyses.
- Preselection (e.g., requiring m_γγ < 20 GeV and ≥2 photons/jets) constrains applicability and may bias accepted event populations.
- Simulation-based evaluation with idealized assumptions (e.g., neglect of pileup) may not capture all real detector effects; performance in full HL-LHC conditions could differ.
- Hardware synthesis results vary across toolchains (Vivado vs Vitis) and configurations; larger designs may incur higher latency and resource usage.
- Generalization is limited by the trigger environment (e.g., single-lepton preselection in LHC physics dataset), which restricts applicability to events passing existing triggers.
Related Publications
Explore these studies to deepen your understanding of the subject.