logo
ResearchBunny Logo
Introduction
Materials informatics (MI) leverages machine learning to predict and optimize material properties, relying heavily on comprehensive databases. While extensive databases exist for inorganic and small organic molecules (e.g., Materials Project, AFLOW, OQMD, QM9), polymer databases lag significantly. This is due to the high cost and complexity of data generation, the diversity of polymer structures and processing conditions, and concerns about competitive information leakage. Existing polymer databases like PoLyInfo® (manual literature data compilation) and Polymer Genome (first-principles calculations) have limitations, such as sparse data or lack of application programming interfaces (APIs). The scarcity of computational polymer property data also hampers machine learning applications, especially transfer learning techniques that could bridge the gap between limited experimental and abundant computational data. Existing high-throughput MD simulation datasets for polymers are small, focusing on limited properties. RadonPy aims to address these limitations by providing an open-source, fully automated workflow for all-atom classical MD simulations of polymer properties.
Literature Review
The current largest polymer property database, PoLyInfo®, is manually curated and contains limited data points for each polymer. Polymer Genome offers computational data from first-principles calculations for crystalline polymers and some experimental data for amorphous polymers, but lacks an API and has a small sample size. Previous research in high-throughput MD simulations has been limited in scope, focusing on only a few properties (like glass transition temperature and thermal expansion) and a small number of polymers. The existing tools and workflows are insufficient to create a large-scale, comprehensive computational polymer property database for use in materials informatics.
Methodology
RadonPy, a Python library, automates the entire MD simulation process starting from the SMILES string of a repeating unit. The workflow includes: (1) conformation search using ETKDG method; (2) electronic property calculation (atomic charges, dipole polarizability, HOMO/LUMO energy levels, and dipole moment) using DFT with Psi4; (3) polymer chain generation through a self-avoiding random walk algorithm; (4) force field parameter assignment (GAFF2, with modifications for fluorocarbons); (5) creation of an amorphous simulation cell; (6) equilibration MD simulations using a 21-step compression/decompression protocol; (7) non-equilibrium MD (NEMD) simulation for thermal conductivity; and (8) post-processing property calculations using LAMMPS. 15 properties were calculated (density, radius of gyration, specific heat capacities, compressibilities, bulk moduli, expansion coefficients, self-diffusion coefficient, refractive index, static dielectric constant, nematic order parameter, thermal conductivity, and thermal diffusivity). Data is stored in CSV and pickle formats. The DFT and MD calculations are performed using Psi4 and LAMMPS, respectively, via RadonPy interfaces. The calculations were performed on a supercomputer, enabling parallel computation for numerous polymers. The study used 1138 homopolymers selected from PoLyInfo, ensuring a diverse representation of polymer backbones. Validation involved comparing MD results against experimental values from PolyInfo, focusing on density, thermal conductivity, refractive index, specific heat capacity, and linear/volume expansion coefficients. A machine learning approach (transfer learning) was used to calibrate systematic biases and variances observed in certain properties. Thermal conductivity decomposition analysis was performed using a modified Irving-Kirkwood equation to quantify the contribution of different types of interactions to thermal conductivity.
Key Findings
RadonPy successfully automated the calculation of 15 properties for over 1000 amorphous polymers. The calculated density, refractive index, and thermal conductivity showed good agreement (high R²) with experimental values from PoLyInfo. The specific heat capacity showed high correlation but with a systematic overestimation, likely due to the absence of quantum effects in the classical MD model. Linear and volume expansion coefficients showed weak correlation due to substantial variations in both computational and experimental data. Eight polymers with unusually high thermal conductivity (>0.4 W m⁻¹ K⁻¹) were identified, six of which had unreported thermal conductivity values. These high-conductivity polymers featured high density of hydrogen bonding units or rigid linear backbones. Decomposition analysis revealed that high thermal conductivity in these polymers was related to heat transfer via hydrogen bonds, dipole-dipole interactions (for polymers with hydrogen bonding units), and strong covalent bonds in rigid, linear backbones. Transfer learning significantly improved the prediction accuracy of the specific heat capacity, linear expansion coefficient, and volume expansion coefficient compared to direct MD-calculated values, reducing the systematic biases. The chemical space analysis shows that the calculated polymers covered a wide range of polymer chemistries, without a significant selection bias.
Discussion
RadonPy effectively addresses the challenges in generating large-scale, reliable polymer property data for MI. The high concordance of calculated and experimental values for density, refractive index, and thermal conductivity demonstrates the accuracy and reliability of the automated workflow. The success in identifying novel high-thermal-conductivity polymers exemplifies the potential of RadonPy for materials discovery. The use of transfer learning successfully corrected systematic biases, improving prediction of properties that showed poor correlation otherwise. The identification of polymers with high thermal conductivity and the understanding of their underlying mechanisms provide valuable insights for designing new thermally conductive materials. The study reveals the potential of high-throughput computational approaches for accelerating polymer materials discovery.
Conclusion
RadonPy offers a valuable tool for automating the calculation of polymer properties using all-atom classical MD simulations. This work validated the approach by comparing the calculations with experimental data, and demonstrated successful identification of novel high-performing materials. The application of transfer learning improved the accuracy of certain properties. Future work could focus on expanding the range of calculable properties, incorporating more complex polymer architectures (e.g., copolymers, branched polymers), and developing more sophisticated calibration methods to further refine the accuracy of predictions. The continued development and use of RadonPy will significantly contribute to advancing polymer informatics and materials discovery.
Limitations
The study focused on linear homopolymers, limiting the applicability to more complex polymer architectures. The accuracy of certain properties, like linear and volume expansion coefficients, was limited by variations in both the experimental data and MD simulations. The study's reliance on existing experimental data from PoLyInfo introduces potential biases due to the quality and consistency of that dataset. Furthermore, the computational cost remains significant, requiring access to high-performance computing resources.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny