logo
ResearchBunny Logo
RadonPy: automated physical property calculation using all-atom classical molecular dynamics simulations for polymer informatics

Engineering and Technology

RadonPy: automated physical property calculation using all-atom classical molecular dynamics simulations for polymer informatics

Y. Hayashi, J. Shiomi, et al.

This paper introduces RadonPy, an innovative open-source library that automates all-atom classical molecular dynamics simulations to uncover the properties of over 1000 amorphous polymers. Conducted by researchers including Yoshihiro Hayashi and Junichiro Shiomi, the study validates MD results against experimental data, identifies polymers with high thermal conductivity, and showcases the power of machine learning in enhancing polymer informatics.... show more
Introduction

Materials informatics relies on large, high-quality datasets, yet open databases for polymers lag behind those for inorganic materials and small molecules. Existing polymer resources (e.g., PoLyInfo and Polymer Genome) are limited in scope, density, and accessibility, with sparse multi-property records and few APIs. Generating computational polymer property data at scale is technically and computationally challenging due to complex MD workflows, parameter choices, and long run times. This study introduces RadonPy, an open-source Python library that fully automates all-atom classical MD for polymers, enabling high-throughput computation of a comprehensive set of properties for amorphous polymers. The research aims to validate MD-calculated properties against experiments, calibrate MD biases using machine learning, and demonstrate how such data can accelerate polymer informatics, including identifying polymers with high thermal conductivity relevant to thermal management applications.

Literature Review

The paper situates RadonPy within the context of existing materials databases and polymer informatics tools. For inorganic and small-molecule systems, large open databases such as Materials Project, AFLOW, OQMD, and QM9 have enabled major advances via high-throughput first-principles computations and API-driven access. In polymers, PoLyInfo is the largest curated experimental database (~100 properties for >18,000 polymers) but is sparse and lacks APIs. Polymer Genome provides first-principles computed electronic/optical properties for crystalline/single-chain polymers (e.g., bandgaps, dielectric constants, refractive indices) and some experimental amorphous properties, but coverage and property diversity are limited. Prior high-throughput polymer MD efforts are few; notable work by Afzal et al. computed T_g and thermal expansion for 315 polymers. Workflow-level tools like pysimm can streamline parts of MD, but no open-source software previously automated the end-to-end pipeline (structure generation, equilibration, NEMD, property extraction, error handling) across diverse polymers. Transfer learning between computational and experimental domains has shown promise in materials science, including previous work by the authors on predicting thermal conductivity of amorphous polymers using neural networks trained on computational properties.

Methodology

Software overview: RadonPy (Python 3.7–3.9) interoperates with RDKit for cheminformatics, Psi4 for quantum chemistry, and LAMMPS for MD. Inputs include a SMILES string for the repeating unit (with two asterisks marking connection points), degree of polymerization, number of chains, and temperature. The pipeline automates conformer search, DFT-based charge/polarizability calculation, chain and cell generation, force-field assignment, packing and equilibration, NEMD for thermal conductivity, equilibrium property calculations, convergence checking, restart scheduling, and data output (trajectories, thermodynamics, CSV properties, pickled final states). Properties computed: density, radius of gyration (Rg), Cp, Cv, (iso)thermal and (iso)entropic compressibility and bulk modulus, linear and volume expansion coefficients, self-diffusion, refractive index, static dielectric constant, nematic order parameter, thermal conductivity, and thermal diffusivity. Dataset: From 15,335 PoLyInfo homopolymers containing H, C, N, O, F, P, S, Cl, Br, I, 1,138 unique amorphous homopolymers were selected to maximize overlap with experimental properties (amorphous or unidentified postforming states, neat resin, 273–323 K measurements, linear or unidentified topology). Five independent MD runs per polymer were attempted; successful automation occurred ≥1 time for 1,070 polymers, ≥3 times for 1,001, and all five for 759. Conformer search (RDKit): Up to 1,000 3D conformers generated via ETKDGv2; GAFF2-based MM optimization and clustering by Butina on torsion fingerprint deviation; top 4 conformers optimized by DFT (ωB97M-D3BJ/6-31G(d,p)); lowest-energy conformer selected. Electronic properties (Psi4): RESP charges from HF/6-31G(d) single-point; HOMO, LUMO, dipole from ωB97M-D3BJ/6-311G(d,p) (LanL2DZ for I); isotropic polarizability via finite-field at ωB97M-D3BJ/6-311+G(2d,p) (basis variations for Br, I). Polymer chain generation: Self-avoiding random walk connects repeating units; head/tail bonds aligned coaxially and anti-parallel; capped H charges summed to bonded atoms to ensure neutrality. Chains contained ~1,000 atoms (degree of polymerization varies), producing similar molecular weights across polymers; tacticity set to atactic. Force field: GAFF2 with Lennard-Jones and bonded terms; modified parameters for fluorocarbons (Träg & Zahn). Automatic parameter assignment; missing angle terms estimated per GAFF2 heuristics. Amorphous cell generation: Ten polymer chains randomly placed and rotated to form initial ~10,000-atom cell at 0.05 g cm−3. Packing simulation (LAMMPS): NVT 1 ns heating 300→700 K (Nosé-Hoover), then 1 ns NVT isotropic cell reduction to 0.8 g cm−3 at 700 K; Coulomb off, LJ cutoff 3.0 Å to avoid globule-like collapse; SHAKE constraints on all bonds/angles including hydrogens; timestep 1 fs; PBC applied. Equilibration protocol: 21-step compression/decompression (to 50,000 atm, 600 K; down to 1 atm, 300 K) combining NVT and NpT (Nosé-Hoover thermostat/barostat) over ~1.5 ns, followed by ≥5 ns NpT at 300 K, 1 atm; equilibrium assessed every 5 ns. Equilibrium criteria: relative SD thresholds on energy components (total, kinetic, bonding, angle, dihedral, vdW, long-range Coulomb), density (<0.1%), and Rg (<1%). Max equilibration 50 ns; failures discarded. Nonbonded twin-range cutoffs 8/12 Å; PPPM for long-range Coulomb; SHAKE enabled. Nematic order parameter required <0.1 to confirm amorphousness. Thermal conductivity (reverse NEMD): Equilibrated cell triplicated in x; box split into 20 slabs; every 200 fs, swap velocities between hottest atom in slab N/2 and coldest in slab 0 (Müller-Plathe method). Preheating: 2 ps NVT at 300 K; production: 1 ns NVE; timestep 0.2 fs; SHAKE off; 8/12 Å twin-range; PPPM for Coulomb. Linearity of temperature gradient required (R²≥0.95). Thermal conductivity computed via Fourier’s law from imposed heat flux and measured gradient. Thermal diffusivity from λ/(ρCp). Thermal conductivity decomposition: Energy flux decomposed via modified Irving–Kirkwood formalism into convection, bond, angle, dihedral, improper, and nonbonded (pairwise+K-space) contributions; partial conductivities from partial flux fractions. Property calculations: Standard fluctuation/correlation formulas for ρ, Rg, Cp (enthalpy fluctuations at 1 atm), βT, KT, αp (covariance of V and H, isotropic αL=αp/3), Cv, βs, Ks, D (Einstein relation), refractive index (Lorentz-Lorenz using DFT polarizability and density), static dielectric constant (dipole fluctuations plus electronic contribution n²), nematic order parameter (largest eigenvalue of second-rank tensor). Box-size sensitivity tests on 28 polymers (10–50 chains; 1000–2000 atoms per chain) found negligible size effects under studied conditions. Visualization of chemical space: ECFP6 fingerprints on macrocyclic 10-mers, UMAP with Hamming distance over 15,335 PoLyInfo polymers; subset of 1,070 computed polymers showed similar distribution, indicating minimal selection bias across 21 backbone classes.

Key Findings
  • High-throughput computation: 15 properties calculated for >1,000 amorphous polymers; automated pipeline succeeded at least once for 1,070/1,138 polymers, ≥3 times for 1,001, 5/5 times for 759.
  • Validation against experiments (PoLyInfo): • Density: Strong correlation, R²=0.890; slight underestimation (fit slope 0.805). N=382. • Thermal conductivity: Moderate correlation, R²=0.490; variance increases for higher λ. N=34. • Refractive index: Strong correlation, R²=0.809; slight underestimation (slope 0.839). N=107. • Specific heat capacity Cp: Moderate correlation, R²=0.602; systematic overestimation (slope 1.430) attributed to missing quantum effects in classical MD. N=66. • Linear expansion coefficient: Weak correlation, R²=0.178. N=165. • Volume expansion coefficient: Weak correlation, R²=0.217. N=144.
  • Property distributions (N≈1070): Thermal conductivity ranged 0.082–0.619 W·m−1·K−1 (mean 0.240). Thermal diffusivity 2.96×10−8–2.27×10−7 m²·s−1. Density 0.742–1.914 g·cm−3 (mean 1.133). Refractive index 1.274–1.839 (mean 1.550). Cp mean ≈3086 J·kg−1·K−1. Additional statistics reported for bulk moduli, compressibilities, expansion coefficients, dielectric constants, diffusion.
  • Eight amorphous polymers identified with high λ (>0.4 W·m−1·K−1) and low inter-run SD (<0.05 W·m−1·K−1), including polyethylene (P11, 0.456 W·m−1·K−1) consistent with PoLyInfo (0.39–0.53), poly(vinyl alcohol) (P1241, 0.439 vs PoLyInfo 0.31), poly(vinylene carbonate) (P1305), Kevlar-like aromatic polyamide (P1626), and several aromatic polyimides (e.g., P1687, P1711, P1715, P1093).
  • Mechanistic decomposition of λ: • Polyethylene (P11): large bond-angle bending contribution. • PVA (P1241) and PVCarbonate (P1305): dominant nonbonded contributions from dense hydrogen bonds and dipole–dipole interactions. • Kevlar-like P1626: mixed contributions from covalent (bond/angle/dihedral) and nonbonded (H-bonds/dipoles) due to rigid backbone and amide groups. • Aromatic polyimides (P1687, P1711, P1715, P1093): strong bond-stretching contributions reflecting rigid, linear backbones; P1093 also shows notable nonbonded contribution due to amide functionality. • Convection contribution showed little correlation with total λ; bond, angle, dihedral, and nonbonded components correlate positively with λ.
  • Multivariate trends: • λ approximately proportional to scaled radius of gyration (Rg/M^0.6), generalizing prior findings from amorphous polyethylene to diverse polymers. • Specific heat inversely proportional to density (consistent with Dulong–Petit reasoning and mean atomic weight trends). • Refractive index for amorphous polymers appears dominated by polarizability rather than density (per Lorentz–Lorenz), with high-n polymers exhibiting large π-conjugated backbones. • Pareto frontier suggests difficulty achieving simultaneously high Cp and low λ in amorphous polymers; unexplored regions exist for low λ with high n and high λ with low n.
  • Machine-learning calibration (transfer learning): Pretrain on MD data then fine-tune on experimental data (shotgun transfer). For Cp, linear and volume expansion coefficients, transferred models substantially reduced bias and variance relative to direct MD-to-experiment comparisons; Cp systematic bias nearly eliminated.
Discussion

The study demonstrates that a fully automated, end-to-end MD workflow can reliably generate multi-property datasets for amorphous polymers at scale and that these simulations reproduce experimental trends for several key properties (density, refractive index, thermal conductivity). Discrepancies due to intrinsic limitations of classical MD (e.g., missing quantum effects for Cp) and system-size/timescale constraints (for expansion coefficients) can be mitigated through transfer learning that leverages abundant MD data to calibrate against sparse experiments. The identification and mechanistic decomposition of eight high-thermal-conductivity amorphous polymers elucidate design principles—densely hydrogen-bonding motifs and rigid, linear backbones—that can guide targeted synthesis and screening. Observed multivariate relationships (e.g., λ–Rg scaling, Cp–density trade-offs, polarizability-dominated refractive index) provide interpretable structure–property insights and highlight Pareto frontiers relevant to materials design. Overall, the results support the feasibility and utility of constructing large computational property databases for polymers, akin to those that accelerated inorganic materials informatics, and underscore the importance of integrated computation–ML approaches to bridge model–experiment gaps.

Conclusion

RadonPy provides an open, automated pipeline for computing diverse physical properties of polymers via all-atom classical MD, enabling high-throughput data generation across broad chemical space. Applied to >1,000 amorphous homopolymers, the workflow delivered validated density, refractive index, and thermal conductivity, revealed systematic biases (e.g., Cp) amenable to ML calibration, and uncovered eight amorphous polymers with exceptionally high thermal conductivity along with their underlying transport mechanisms. The platform and resultant datasets are poised to accelerate polymer informatics, including transfer learning from computational to experimental domains and data-driven materials design. Future work includes expanding property coverage (e.g., glass transition temperature, dielectric loss tangent, cohesive energy density), establishing property-specific automated protocols (particularly for mechanically and viscoelastically dominated behaviors), generating temperature- and molecular-weight-dependent profiles, improving experimental benchmark datasets via curation or new measurements, addressing data storage/formatting at scale (e.g., selective trajectory retention, adoption of BigSMILES for copolymers/branched systems), and extending to more complex polymer states (oriented, blends, copolymers).

Limitations
  • Model–experiment mismatch: Differences in degree of polymerization and its distribution, chain orientation/crystallinity, impurities, and entanglement can cause discrepancies, notably for thermal conductivity and expansion coefficients.
  • Classical MD limitations: Absence of quantum effects leads to systematic Cp overestimation; accurate treatment would require quantum corrections or alternative methods.
  • Finite-size and timescale constraints: Limited cell sizes and trajectories contribute to high variability for linear/volume expansion coefficients; some properties (mechanical/viscoelastic) are not reliably captured under current settings.
  • Computational cost and robustness: Equilibration can fail (e.g., lack of convergence, partial orientation, non-linear NEMD gradients); large resource demands (tens of hours per polymer) restrict throughput.
  • Data storage and formatting: Large trajectory files (~20 GB per polymer) challenge database storage; current pipeline focuses on linear polymers represented by SMILES, necessitating advanced notations (e.g., BigSMILES) for copolymers and branched architectures.
  • Scope: Current validation is comprehensive for six properties but limited for others; broader experimental datasets with consistent protocols are needed to refine calculation conditions and benchmarking.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny