logo
ResearchBunny Logo
Artificial intelligence and thermodynamics help solving arson cases

Engineering and Technology

Artificial intelligence and thermodynamics help solving arson cases

S. Korver, E. Schouten, et al.

This groundbreaking research by Sander Korver, Eva Schouten, Othonas A. Moultos, Peter Vergeer, Michiel M. P. Grutters, Leo J. C. Peschier, Thijs J. H. Vlugt, and Mahinder Ramdin unveils how machine learning and thermodynamic modeling can predict the initial composition of gasoline samples from weathered evidence. Achieving remarkable accuracy with minimal error even in severely weathered cases highlights its potential in linking fire scenes to suspects, revolutionizing arson investigation.... show more
Introduction

The study addresses a key forensic challenge in arson investigations: linking a weathered gasoline sample from a fire scene to an unweathered gasoline sample found with a suspect. Weathering alters gasoline composition due to evaporation and other potential processes, complicating direct comparison. Gasoline is a complex mixture of many components with varying volatilities and potential nonideal interactions, and the degree of weathering at a crime scene is typically unknown. The purpose is to develop a method that can accurately back-trace the initial (unweathered) composition of a weathered gasoline sample, enabling robust forensic comparisons. The work is significant because existing approaches struggle with highly weathered samples, handle only limited components, require extensive experimental datasets, or cannot predict the initial composition of a weathered sample, which is essential for comparison to unweathered suspect samples.

Literature Review

Prior approaches include statistical methods without explicit evaporation models such as PCA, LDA, CVA, HCA, and covariance mapping, as well as likelihood ratio frameworks. Other studies employed explicit evaporation models based on Raoult’s law, gas chromatographic retention data, or experimental methods. Limitations of these methods include handling only a limited number of components, poor applicability to highly weathered samples, large data requirements, and the inability to predict the original unweathered composition from a weathered sample. The literature also highlights that nonideality and polar components (e.g., ethanol, ethers) significantly influence evaporation behavior, underscoring the need for models that account for activity coefficients in multicomponent, nonideal mixtures.

Methodology

The approach combines thermodynamic modeling of evaporation with quantum-chemistry-derived activity coefficients and machine learning to estimate the degree of weathering. Thermodynamics: A gamma–phi vapor–liquid equilibrium framework is used to model evaporation via a differential mass loss equation for each component, assuming a well-mixed liquid, ideal gas phase, a uniform gas-phase mass transfer coefficient across components, and constant temperature. The mass transfer coefficient and time are combined into a dimensionless variable. Saturation vapor pressures were sourced from NIST, DIPPR, and DDBSP. Nonideality is captured through activity coefficients computed with COSMO-RS based on quantum chemical sigma profiles (ADF software). Numerical integration of the evaporation equation is performed with Heun’s method in Python. Forward integration predicts composition changes with increasing evaporation; backward integration reconstructs initial composition from a weathered state, requiring a stopping criterion equal to the unknown degree of weathering. Machine learning: An artificial neural network estimates the degree of weathering from composition. Dataset: 459 gasoline samples collected in 2011 across 230 petrol stations in the Netherlands (28 brands). Compositions were obtained via GC-FID, integrating 60 well-resolved peaks; oxygenated components were not measured due to coelution and were added at typical Dutch concentrations (3 wt% ethanol, 1 wt% MTBE, 1 wt% ETBE) before normalization. Thermodynamic simulations generated evaporation curves for these 60-component mixtures. Training used 400 samples’ full evaporation curves (about 6000 points per sample); validation used the remaining 60 samples (360,000 points). ANN architecture: TensorFlow DNNRegressor with ProximalAdagrad optimizer; dense network with 5 hidden layers (1024, 512, 256, 128, 64 nodes). Input: 60-component mole fraction vector. Output: estimated evaporation percentage. The ANN’s estimated degree of weathering is then used as the stopping point for backward integration to infer the unweathered composition. Activity coefficients: 41 of 60 components were available in the ADF database; 19 were built and their COSMO-RS profiles computed. Temperature effects were examined up to 330 K and found small under tested conditions. Synthetic case studies illustrate the impact of activity coefficients, especially for polar components like ethanol.

Key Findings
  • The combined ANN plus thermodynamic model predicts the initial (unweathered) composition of 60-component gasoline samples from weathered compositions with deviations around 4% for volatile/oxygenated components when weathered up to 80 wt%, and less than 1% for many aromatic and non-aromatic components.
  • The ANN predicts degree of weathering with high accuracy: deviations up to about 0.5% for samples evaporated to 10%, and around 3% for samples evaporated to 80%. Overall accuracy for unseen samples is within approximately 3%.
  • Nonideality critically affects evaporation when polar components are present. For a four-component synthetic mixture, ethanol is fully evaporated near 90 wt% under ideal behavior but near 40 wt% when COSMO-RS activity coefficients are used, demonstrating the need to include activity coefficients.
  • For a seven-component nonpolar synthetic mixture, activity coefficients have minor impact on evaporation behavior; model predictions align with literature data.
  • Temperature effects on composition change were small up to 330 K under the tested assumptions.
  • The method enables inclusion of volatile components in forensic comparisons, as inter-sample variability in unweathered gasoline exceeds the observed prediction deviations (~4%).
Discussion

The findings demonstrate that accurate back-tracing of unweathered gasoline composition from weathered samples is feasible by integrating machine learning with thermodynamically rigorous evaporation modeling that accounts for mixture nonideality via COSMO-RS. Estimating the degree of weathering with an ANN resolves the practical challenge of unknown evaporation extent, enabling effective backward integration. Accounting for activity coefficients is essential in mixtures with polar components, dramatically influencing predicted evaporation pathways. The observed prediction errors (≤4% for volatile components at high weathering, <1% for many others) are well below the natural inter-variation among gasoline samples, thereby strengthening forensic discrimination and enabling the use of volatile components that were previously problematic. Model performance degrades modestly with increased weathering due to loss of low-boiling components, but remains within a few percent. Enhancements are possible through additional high-weathering training data and improved activity coefficient models for highly concentrated residual mixtures.

Conclusion

This work presents a quantitative framework that combines thermodynamic evaporation modeling, COSMO-RS activity coefficients, and an ANN estimator of weathering degree to reconstruct the initial composition of weathered gasoline samples. The approach generalizes to many components and supports forward and backward tracking, enabling robust forensic comparisons between crime-scene weathered samples and unweathered suspect samples. Key contributions include accurate prediction of evaporation degree (≈0.5–3% deviation across 10–80% weathering) and reconstruction of initial compositions with deviations around 4% for volatile components. Future directions include incorporating more training data—especially for highly weathered samples—using more accurate activity coefficient models for concentrated residuals, extending to conditions approaching supercritical regimes (e.g., via phi–phi VLE approaches), and eventually integrating case-specific interfering effects such as adsorption, extinguishing media, pyrolysis, and microbial degradation.

Limitations
  • Interfering effects such as preferential adsorption to substrates, combustion byproducts, pyrolysis, microbial degradation, and fire extinguishing media are neglected; results represent an idealized evaporation-only scenario.
  • Assumptions include well-mixed liquid, ideal gas phase, equal gas-phase mass transfer coefficient for all components, and constant temperature.
  • COSMO-RS activity coefficient predictions carry typical errors (<10% nonpolar, <30% polar) and may deteriorate at high concentrations in highly weathered residues.
  • The gamma–phi framework is limited to non-supercritical conditions; extrapolation or phi–phi methods would be required near supercritical regimes.
  • Oxygenated components were not measured experimentally and were added at typical concentrations, which may introduce composition assumptions.
  • Training data are from gasoline samples collected in the Netherlands in 2011; generalizability to other regions/formulations may require additional datasets.
  • Model accuracy decreases somewhat for highly weathered samples due to near-complete loss of volatile components and sparser effective training information.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny