Engineering and Technology
Gas permeability, diffusivity, and solubility in polymers: Simulation-experiment data fusion and multi-task machine learning
B. K. Phan, K. Shen, et al.
This research by Brandon K. Phan and colleagues introduces a groundbreaking multi-tiered multi-task learning framework that predicts gas permeability in polymers. By merging experimental and simulation data through advanced techniques, the study enhances model generalizability and predictive accuracy, especially in underexplored chemical spaces.
~3 min • Beginner • English
Introduction
Polymer membranes are critical for separations in applications such as carbon capture, water purification, drug delivery, and packaging. A key performance metric is gas permeability, defined by the solution–diffusion model as the product of gas diffusivity and solubility. Accurately predicting permeability across diverse gases and polymer chemistries would accelerate discovery of high-performance membranes. Experimental measurements via constant-volume permeation are accurate but resource-intensive, while classical molecular simulations can generate data at scale but with lower fidelity due to force-field limitations and practical time-scale constraints. Prior ML efforts in polymer gas transport have advanced from simple, hand-crafted features to learned fingerprints, yet models often struggle to extrapolate to new chemical spaces. The research question addressed here is whether multi-task learning and multi-fidelity data fusion—integrating experimental and simulated data across permeability, diffusivity, and solubility for multiple gases—can improve prediction accuracy and generalizability beyond single-task, experiment-only models. The purpose is to build a unified, generalizable predictor leveraging correlations among related properties and across data fidelities to overcome data scarcity and extrapolation challenges, thereby enabling broader, faster polymer screening.
Literature Review
Early ML studies (e.g., Wessling et al.) correlated limited descriptors such as spectra or experimental conditions with permeability, achieving initial success but limited transferability. Subsequent work grew datasets (hundreds of polymers, multiple gases) and used imputation (e.g., MICE) to fill missing permeability values. A perspective by Ricci et al. summarized strategies and challenges for ML-driven membrane design. Feature engineering evolved from hand-crafted fingerprints to learned fingerprinting from repeat-unit structures, enabling richer structure–property modeling. However, extrapolation beyond known polymer–property space remained problematic. Multi-task learning in polymer informatics has been explored to share information across properties and tasks, but prior approaches often focused on permeability alone or combined dissimilar property types without explicitly integrating simulation data. Multifidelity information fusion has been effective in other polymer property domains (e.g., bandgap, crystallization tendency). This work expands on these by fusing high-fidelity experimental data with low-fidelity but abundant simulations and jointly learning permeability, diffusivity, and solubility across multiple gases.
Methodology
Data curation and scope: Experimental permeability (P), diffusivity (D), and solubility (S) for six gases (CO2, CH4, O2, N2, H2, He) were collected from 84 publications (Polymer Handbook sources). Experiments span 25–35 °C and 1–30 atm. The experimental dataset includes 820 polymers and 5007 measurements (3748 Pexp, 709 Dexp, 550 Sexp). High-throughput classical simulations generated additional data: 533 Psim, 581 Dsim, and 667 Ssim across 357 polymers.
Simulation pipeline: Structures were built from polymer repeat-unit SMILES using the Polymer Structure Predictor (PSP). Systems comprised 27 polymer chains (~150 atoms/chain, methyl-capped) in cubic boxes. Force fields: GAFF2 for polymers; TraPPE models for CO2, CH4, O2, N2 treated as rigid molecules. A 21-step equilibration protocol ensured conformational relaxation and density convergence (consistent trends, minor underestimation). Diffusivity (Dsim): 27 gas molecules were inserted to maintain dilute, Fickian regime; NPT equilibration (10 ns) followed by 100–200 ns NVT production, Nosé–Hoover thermostat/barostat, 1 fs timestep, outputs every 1 ps, block-averaged MSD to obtain D via linear fit of long-time MSD; uncertainties from block averaging. Solubility (Ssim): Widom insertion in MC during a 5 ns NVT production; 50 snapshots (every 100 ps), 25,000 random insertions/snapshot; excess chemical potential yields Henry’s constant, converted to solubility at 1 atm (IUPAC standard). Insertions with energies >5 kBT discarded. 25 configurations used to estimate mean, standard deviation, and standard error; results screened for SE <5%.
Validation of simulations: Simulated properties compared against available experimental values for overlapping systems: 308 systems for P, 326 for D, 343 for S. Simulations generally overestimated absolute values (attributed to slightly lower simulated densities/free volume effects and Widom/force-field limitations) but captured trends across polymer–gas chemistries.
Model architecture (polyGNN): Inputs are polymer SMILES and a selector vector encoding data fidelity (experiment/simulation), property (P, D, S), and gas identity. The encoder transforms SMILES into a periodic graph; atoms and bonds receive initial fingerprints. A message-passing block with skip connections iteratively updates node embeddings; a graph aggregation produces a learned polymer fingerprint. The fingerprint concatenated with the selector is fed into an MLP estimator to output the target property. Leaky ReLU activations, dropout (with MC dropout for uncertainty), trained in PyTorch/Geometric with Adam optimizer and MSE loss. Ensemble of five cross-validated submodels averages predictions.
Training protocol and benchmarking: Data were grouped by property, gas, and fidelity and min–max scaled per group. Stratified splits by polymer SMILES ensured all gas data for a held-out polymer remain in test. Test fractions: 20%, 40%, 60%, 80% (data-scarce). Capacity (message-passing steps) chosen using NNDebugger to overfit training (if possible) or maximize R²; hyperparameters (batch size, learning rate, dropout) optimized via scikit-optimize on an HP validation split. Five-fold CV produced an ensemble; test sets remained unseen during HP/CV. Metrics: coefficient of determination (R²) and order-of-magnitude error (OME; log10 of mean absolute error).
Benchmark models: ST: trained only on Pexp. MT-1: Pexp plus Psim (fusing fidelities for the same property). MT-2: Pexp plus Dexp and Sexp (fusing correlated properties, high-fidelity). MT-3: Pexp augmented with Dexp, Sexp, and Psim, Dsim, Ssim (full multi-task, multi-fidelity, multi-gas). A production model adopting MT-3 was trained on the entire fused dataset for deployment and large-scale screening.
Key Findings
- Dataset scale: Experimental measurements total 5007 (3748 Pexp, 709 Dexp, 550 Sexp) across 820 polymers. Simulations add 1781 entries (533 Psim, 581 Dsim, 667 Ssim) across 357 polymers, yielding 6788 total fused datapoints and 1050 unique polymers for the production model.
- Simulation validation: Correlations between simulated and experimental values show positive trends: permeability r ≈ 0.721 (308 systems, 96 polymers), diffusivity r ≈ 0.724 (326 systems, 98 polymers), solubility r ≈ 0.824 (343 systems, 96 polymers). Simulations tend to overestimate magnitudes due to lower simulated densities and Widom/force-field approximations.
- Multi-task vs single-task benchmarking for Pexp prediction:
• ST (experiment-only P): average R² ≈ 0.57; OME ≈ 0.38; performance degrades at high test fractions (e.g., 80% test: R² < 0.50; OME ≈ 0.44 Barrer).
• MT-1 (add Psim): average R² ≈ 0.77; OME ≈ −0.30; benefits most at 80% test, mitigating extrapolation gaps via simulation data fusion.
• MT-2 (add Dexp, Sexp): average R² ≈ 0.93; OME ≈ 0.12; largest gain from leveraging correlated high-fidelity properties and physics (P = D×S).
• MT-3 (full fusion): average R² ≈ 0.96; OME ≈ 0.10; best overall performance by combining multi-fidelity and multi-property data.
- Production model vs prior Polymer Genome model (holdout of 153 systems, 31 polymers, 13 classes): overall R² improved from 0.93 to 0.95. Class-wise improvements notable for polyphosphazenes (0.49 → 0.90), polynorbornenes (0.51 → 0.96), polycarbonates (0.75 → 0.98), polysulfones (0.80 → 0.98), vinyl/vinylidene polymers (0.77 → 0.96), with R² ≥ 0.90 for all classes in the new model except one slight decrease in polyimides/polypyrrolones (0.97 → 0.92).
- Chemical space coverage expanded: datapoints from 1501 (315 polymers) to 6788 (1050 polymers); PCA indicates broader coverage towards the 13,000 known polymers database.
- Forward-looking screening: Predictions for ~13,000 known polymers generate Robeson-type trade-off plots for multiple gas pairs. ML predictions align with experimental trends and bounds; simulation data systematically overpredicts as expected. Diffusivity and solubility trade-off plots reveal sensible trends but highlight high uncertainty in low-diffusivity regimes (e.g., occasional CO2/CH4 diffusivity selectivity <1 due to data sparsity), emphasizing the need for caution and uncertainty awareness.
Discussion
The study demonstrates that integrating multi-fidelity (simulation + experiment) and multi-property (P, D, S) data within a unified multi-task graph neural network substantially improves accuracy and generalizability for gas transport predictions in polymers. By encoding physics-informed relationships (P = D×S) and sharing information across gases and properties, the MT models outperform single-task baselines, especially in data-scarce and extrapolative scenarios. The strongest gains arise when correlated, high-fidelity measured properties (Dexp, Sexp) are included, validating the value of leveraging multiple related experimental targets. Simulation data, while lower fidelity and somewhat biased, effectively complements sparse measurements by expanding chemical coverage and allowing the model to learn cross-fidelity calibrations. The production model not only improves permeability prediction accuracy across many polymer classes but also enables concurrent predictions of diffusivity and solubility, thereby supporting deeper analysis (e.g., separate D and S trade-offs) and design insights. Trade-off plots across ~13,000 polymers reveal both promising candidates and regions of high uncertainty, guiding where further experiments or simulations are needed. Overall, the findings address the initial challenge of robust extrapolation to new chemical spaces and establish a scalable path for polymer membrane discovery leveraging data fusion and MT learning.
Conclusion
This work presents a state-of-the-art multi-task, multi-fidelity graph neural network (polyGNN) that fuses experimental and high-throughput simulation data to jointly predict gas permeability, diffusivity, and solubility for multiple gases across diverse polymers. The approach markedly improves predictive performance over single-task, experiment-only models and expands chemical space coverage, as verified by class-wise benchmarks and large-scale screening. Key contributions include: a robust MD/MC simulation pipeline for D and S; validation and calibration of simulations against experiments; a selector-augmented GNN to encode fidelity, property, and gas; and a production-ready model deployed with expanded datasets. Future directions include: expanding experimental coverage of D and S to further strengthen MT benefits; improving and diversifying force fields and simulation protocols (including non-equilibrium MD for permeability); explicitly modeling semicrystalline polymers and amorphous–crystalline interfaces; incorporating processing history and testing conditions; extending to additional gases and temperatures; and systematic uncertainty quantification to guide experiments in data-sparse regions.
Limitations
- Simulation fidelity: Classical force fields (GAFF2, TraPPE) and Widom insertion introduce biases; simulated systems exhibit slightly lower densities, leading to overestimated D and S relative to experiments.
- Data sparsity and coverage: Certain property ranges (especially low diffusivity) and some polymer classes remain underrepresented, inflating uncertainty and occasionally yielding non-physical selectivity predictions.
- Extrapolation risk: Despite improvements, predictions far from the training chemical space require caution and should be accompanied by uncertainty estimates and follow-up validation.
- Scope constraints: Study focuses on six gases and near-ambient conditions (25–35 °C, 1–30 atm); processing history and morphology (e.g., crystallinity) are not explicitly modeled and can strongly influence transport.
- Derived permeability in simulations: Permeability was computed from Dsim×Ssim rather than direct non-equilibrium MD, which may propagate errors from both components.
Related Publications
Explore these studies to deepen your understanding of the subject.

