
Engineering and Technology
Enhanced detection of threat materials by dark-field x-ray imaging combined with deep neural networks
T. Partridge, A. Astolfo, et al.
This groundbreaking research by T. Partridge, A. Astolfo, S. S. Shankar, F. A. Vittoria, M. Endrizzi, S. Arridge, T. Riley-Smith, I. G. Haig, D. Bate, and A. Olivo reveals how combining dark-field x-ray imaging with deep neural networks significantly enhances the detection of threat materials. The study showcases proof-of-concept experiments that showcase remarkable improvements in material identification, promising advancements in security technologies.
~3 min • Beginner • English
Introduction
The study investigates whether combining x-ray dark-field imaging with conventional attenuation and multi-energy information can enhance discrimination of threat materials (e.g., explosives) from benign substances, and whether deep neural networks (DNNs) can exploit the dark-field texture for detection in realistic, cluttered scenarios. Phase-based x-ray methods enhance detail visibility; dark-field captures ultra-small-angle scattering from microstructural inhomogeneities below system resolution, providing complementary information to attenuation and refraction. The authors hypothesize that: (1) dark-field signals carry material-specific textures enabling improved discrimination; (2) multi-energy acquisition can resolve remaining ambiguities and remove thickness dependence via ratios; and (3) convolutional neural networks can detect explosive textures even under overlap/clutter, outperforming attenuation-only approaches.
Literature Review
The paper situates its work within decades of development of phase-based x-ray imaging, from early interferometry (Bonse & Hart) and synchrotron-based phase contrast, to laboratory implementations using gratings and edge-illumination (EI). Prior work established multi-modal retrieval (attenuation, refraction/phase gradient, and dark-field) via analyzer crystals, gratings, and EI, demonstrating dark-field sensitivity to sub-resolution microstructure and its complementarity to other channels. The security imaging literature has focused largely on attenuation-based dual-energy methods and, more recently, deep learning for baggage imaging; however, the integration of dark-field signals and DNNs for threat detection remains underexplored. This study builds upon EI dark-field methods and leverages transfer learning and texture recognition advances from computer vision to address material discrimination in security contexts.
Methodology
Imaging and signal acquisition: A laboratory EI x-ray imaging system was used with an X-Tek 160 tungsten-anode source (≈80 µm focal spot; 80 kVp, increased to 120 kVp for electrical items study). Detection used an XCounter XC-FLITE FX2 CdTe photon-counting detector (2048×128 pixels, 100 µm pitch) with two energy thresholds to split the spectrum into low and high energy bins. Two gold-on-graphite masks (pre-sample and detector) were used; the pre-sample mask had a 4-way asymmetric aperture pattern to enable robust phase retrieval and extraction of an additional offset image due to partial septal transmission. Components were mounted with precision motorized stages for alignment and scanning.
Phase retrieval and signal formation: EI illumination curves (ICs) were recorded by scanning the pre-sample mask relative to the stationary detector mask. Introduction of a sample causes shifts (refraction), broadening (dark-field), and area reduction (attenuation). Gaussian models approximate ICs; a 4-Gaussian grouped fit was used for robustness. From object-free and object-present ICs, the following quantities were extracted per energy bin: transmission t, dark-field parameter proportional to σ²−σ₀² (difference of squared FWHM with/without sample), refraction, and an offset attenuation-like image σ derived from high-energy photons transmitted through mask septa. Signals were acquired at low and high energies to exploit different energy dependencies of attenuation and dark-field. Thickness linearization was verified for σ²−σ₀², τ (attenuation), and σ across scanned and static acquisitions, enabling formation of thickness-independent ratios by dividing corresponding high/low-energy signals.
Material datasets and experiments without CNNs: Threat/non-threat materials were first imaged in fixed-thickness acrylic boxes (5 mm), then across multiple thicknesses (2.5–37.5 mm) using containers of 2.5, 5, 10, and 20 mm. Scatterplots of transmission vs dark-field showed clustering and microstructural spread differences. To remove thickness dependence and improve discrimination, ratios of high/low energy for σ²−σ₀² (R_ds), τ (R_ab), and σ (R_or) were computed and plotted in a 3D space; a region separating threats from non-threats was identified.
Proof-of-concept CNN Test 1 (overlapping threat vs non-threat in bags): Approximately 200 multi-modal image instances (per random split) of bags were collected: ~100 containing one explosive sample (from 6 explosives) and ~100 containing one benign material (from 6 non-threats; cheese replaced an explosive simulant), with additional cluttering items. Input per instance was a horizontal concatenation of t, (σ²−σ₀²), and offset o images. Ten random 70/30 train/test splits were used; k-fold cross-validation during training; two constrained random crops per image for both train and test. Transfer learning was applied using ImageNet-pretrained GoogleNet and Inception-ResNet, with 1–3 added 640-node fully connected layers; softmax, cross-entropy, and hinge losses were evaluated. An ablation established hinge loss superior to cross-entropy, and Inception-ResNet superior to GoogleNet; best results used one added FC layer and hinge loss. A Type II architecture augmented texture recognition by training an Inception-ResNet on the Describable Textures Dataset (47 texture classes) and concatenating its 47-D output with the 640-D feature vector before hinge-loss training on explosives vs non-explosives.
Training details (PoC1): Optimizer ADAM (β=0.99), L2 regularization λ=1e-4, initial LR 0.001 with exponential decay (step 100, factor 0.96), dropout 0.5, batch size 32, 40 epochs, 2 crops during inference, TensorFlow, NVIDIA GTX 1080Ti.
Proof-of-concept CNN Test 2 (C4 concealed in electrical items): Approximately 576 scans of bags containing a laptop, mobile phone, or hair-dryer with or without concealed C4; 80 were held out for testing. To mitigate thickness variation, ratio images were created by dividing dark-field and attenuation images, preserving material texture cues. A split (stacked) CNN architecture was developed: first layer segments and segregates electronic item compositions; second layer discriminates C4 textures within segmented objects. GoogleNet backbone with softmax loss and transfer learning (ImageNet pretraining); three CNNs were used in the second stack for best empirical performance. No data augmentation or cropping; MOMENTUM optimizer (0.99), L2 λ=1e-4, LR 0.001 with exponential decay (step 200, factor 0.92), dropout 0.5, batch size 32. A formal de visu trial was conducted with four trained security officers evaluating 80 single-energy attenuation images under time constraints; CNN inference on the same 80 images was compared with and without dark-field images.
Key Findings
- Multi-energy dark-field plus attenuation enables material discrimination and reduces thickness dependence: Ratios of high/low-energy signals for dark-field (R_ds), attenuation (R_ab), and offset (R_or) form a 3D space where a region separates threat from non-threat materials (Fig. 2(d)). Dark-field and attenuation signals linearize with thickness, enabling thickness-independent ratios; linearity held for scanned acquisitions (Supplementary Fig. 2) and across ratios (Supplementary Fig. 3).
- PoC CNN Test 1 (bags with overlapping items): Best-performing Type II architecture (Inception-ResNet + 1 FC layer + hinge loss + fused 47-D texture output) achieved 598 TP, 600 TN, 0 FP, 2 FN over aggregate tests, corresponding to 99.6% recall, 100% precision, and 99.8% accuracy. Removing the dark-field channel (using only t and o) reduced accuracy to 93.6% (−6.2 pp) indicating the critical contribution of dark-field texture. Across architectures, Inception-ResNet generally outperformed GoogleNet; hinge loss outperformed cross-entropy and softmax for subtle texture segregation. Best results were obtained without data augmentation.
- PoC CNN Test 2 (C4 concealed in electronics): On 80 held-out test images, the split-network CNN achieved 100% true positive rate with 17.5% false positives and total processing time of ~1 min 58 s for the set. Human operators (de visu) achieved 48.8% TP with 30.6% FP and ~8 min 45 s. Removing dark-field images decreased CNN precision by ~11% and overall accuracy by ~20%, again demonstrating the importance of dark-field for subtle texture discrimination.
- Practical insights: Ratio imaging preserves faint explosive texture cues even when attenuation contrast is weak (e.g., small C4 amounts in laptops/phones). Standard CNNs (GoogleNet, Inception-ResNet) also benefited from including dark-field signals (Supplementary Table 1).
Discussion
Combining dark-field with attenuation, especially at multiple energies, effectively addresses key challenges in material discrimination: it enhances separability by exploiting complementary microstructural texture (dark-field) and energy dependencies, and removes thickness dependence via ratios. However, 2D projection imaging cannot fully disentangle overlapping materials; while renormalization against background can factor out two overlapping signals when isolated regions exist, this is labor-intensive and not always feasible. The proposed CNN approaches leverage the textural nature of dark-field to detect explosive signatures within cluttered and overlapping contexts. Preliminary results show high detection rates, surpassing human performance in a de visu trial, and demonstrate that excluding dark-field significantly degrades performance—highlighting dark-field’s unique contribution. Nonetheless, caveats include limited datasets, heuristic network design, and constrained testing scenarios (single explosive type in Test 2; single target per sub-image in Test 1; human operators viewed only single-energy attenuation images). The study suggests that principled architecture design informed by texture analysis and sample complexity theory, along with larger datasets, could further improve robustness and generalizability. The findings underscore that multi-energy dark-field signals and DNN-based texture analysis can substantially augment current attenuation-only dual-energy methods in security screening and other applications requiring microstructure-based material discrimination.
Conclusion
This work demonstrates that dark-field x-ray imaging provides material-specific textures that, when combined with attenuation and multi-energy acquisition, enable enhanced discrimination of threat materials. Thickness effects can be neutralized using high/low-energy ratios, and a 3D ratio space can separate threat from non-threat classes. Incorporating dark-field images into CNN pipelines significantly improves detection performance in cluttered scenarios, achieving near-perfect accuracy in a small-scale PoC and outperforming trained human operators in detecting concealed C4 in electronics. The approach shows promise for security screening and broader domains (materials science, industrial NDT, medical imaging) where microstructural differences are relevant. Future work should involve larger, more diverse datasets; systematic architecture design incorporating insights from texture analysis; exploration of additional dark-field extraction methods (e.g., gratings); and comprehensive evaluations under realistic operational conditions with multi-energy attenuation displays.
Limitations
- Dataset scale and diversity: PoC Test 1 used limited samples with one target material per sub-image; PoC Test 2 involved a single explosive (C4) concealed in three electronic items, limiting generalizability.
- Heuristic network design: Architectures were iteratively refined (transfer learning, texture fusion, stacked networks) rather than derived from principled capacity/sample complexity analyses; broader ablations were constrained by data scarcity.
- Human comparison constraints: Operators viewed single-energy attenuation images (not the dual-energy color-coded images used in practice), likely underestimating human performance in realistic settings.
- Projection-mixing limitation: 2D imaging cannot fully unmix overlapping materials; background-based renormalization is possible but laborious and contingent on isolated regions.
- No data augmentation: While augmentation reduced performance (likely due to altering dark-field texture statistics and EI’s direction sensitivity), this limits robustness to real-world variability; careful, physics-consistent augmentation strategies were not explored.
- Convergence/architecture constraints: Inception-ResNet showed intermittent convergence in Test 2; reasons not fully investigated.
- Security-driven data availability: Datasets are not publicly available, limiting external validation.
Related Publications
Explore these studies to deepen your understanding of the subject.