Environmental Studies and Forestry
Bioplastic design using multitask deep neural networks
C. Kuenneth, J. Lalonde, et al.
Explore how groundbreaking research by Christopher Kuenneth, Jessica Lalonde, Babetta L. Marrone, Carl N. Iverson, Rampi Ramprasad, and Ghanshyam Pilania develops multitask deep neural network predictors that identify promising biodegradable alternatives to non-degradable plastics. This innovative approach could transform our reliance on petroleum-based commodities.
~3 min • Beginner • English
Introduction
The study addresses the urgent need to replace petroleum-based plastics, a major source of persistent waste and microplastic pollution, with sustainable bioplastics that match application-relevant performance. Polyhydroxyalkanoates (PHAs) are promising bio-derived and biodegradable polymers with tunable thermal, mechanical, and gas transport properties via variations in backbone length, side-chain length, and functional groups, as well as via copolymerization. However, the vast chemical design space of PHAs and their copolymers makes traditional experimental or computational screening impractical. The research goal is to develop data-driven, multitask deep learning property predictors to efficiently navigate this large chemical space and identify PHA-based bioplastic candidates that can replace commonly used petroleum-based plastics while meeting similar performance requirements.
Literature Review
Prior work has explored PHA synthesis, processing, and modification, demonstrating that PHA chemistry (e.g., backbone length, side-chain functional groups) can modulate properties such as Tg, Tm, Td, E, strength, and elongation. Previous polymer informatics studies have built predictive models for specific properties (e.g., Tg and Tm) and highlighted the potential of multitask learning and meta learners to improve performance by leveraging cross-property correlations. The literature also documents the industrial relevance of PHAs, the challenges of experimental high-throughput screening, and the limitations of computationally expensive simulations (DFT, MD) for large-scale searches. Biosynthetic routes to aromatic PHAs and chemical synthesis of PHA and PHA-containing copolymers have been reported, indicating a maturing landscape for translating in silico discoveries to synthesis.
Methodology
Data curation: The authors compiled 22,731 experimental data points for homo- and copolymers covering 13 properties in three categories: thermal (Tg, Tm, Td), mechanical (E, σy, σb, εb), and gas permeability (μO2, μCO2, μN2, μH2, μHe, μCH4). Of these, 7,512 are copolymer data points spanning over 1,440 distinct copolymer chemistries. When multiple measurements existed for a polymer, values were averaged after manual curation. For consistency, only Tg and Tm measured by DSC, Td by TGA, and mechanical properties near room temperature (300 K) were used. All copolymer entries are random copolymers. Outliers were flagged using DBSCAN clustering (Scikit-learn) for manual inspection. Property values were min-max scaled to [0,1] for training and inverse-transformed for metrics; gas permeabilities were log10-transformed as log10(x+1) due to power-law distributions.
Predictive modeling: Three multitask deep neural network predictors (one per property category) with meta learners were trained to predict the 13 properties simultaneously, exploiting inter-property correlations. Model implementation details and hyperparameter optimization are provided in Methods/Supplementary Information. Fingerprints representing polymer chemistry and composition (including SMILES-encoded comonomers for copolymers) were used as inputs, designed to be smooth and well-conditioned for learning. Cross-validation (five-fold) and a meta learner were employed; performance metrics (RMSE, R2) were computed on validation sets.
Bioplastic search space generation: The candidate space includes 540 PHAs constructed by varying backbone and side-chain carbon counts (n,m = 1–6) and 17 side-chain terminal functional groups, plus 13 conventional polymers (chosen as representative commodity plastics). Copolymer candidates are generated as the outer product of PHAs with PHAs and PHAs with conventional polymers at 11 compositions (c = 0, 0.1, ..., 1). The full space contains 1,373,503 polymers: 553 homopolymers, 146,070 PHA–PHA copolymers, and 7,033 PHA–conventional copolymers. UMAP projections of fingerprint subspaces were used to visualize chemical organization and continuity across compositions.
Candidate selection protocol: Predicted properties were computed for all candidates using the meta learners. A two-step selection identified bio-replacements for seven commodity plastics (PE, PP, PVC, PET, PS, Nylon 6, PEN). Step 1: nearest neighbors search (Scikit-learn) retrieved the five closest candidates (within the multi-property space) in each subgroup (PHA-only and PHA–conventional) to the target polymer’s average experimental property vector (from PoLyInfo, standard conditions). Step 2: domain-expert filtering prioritized candidates with plausible synthesis routes (biosynthetic or chemical). The final selections are reported, with the full set of 70 candidates provided as Supplementary Data; code and prediction datasets are shared on GitHub.
Key Findings
- Models: Three multitask DNN meta learners achieved overall high validation performance. Average R2 values were approximately 0.97 (thermal), 0.94 (mechanical), and 0.99 (gas permeability). Property-wise meta learner metrics (examples): Tg R2=0.98 (RMSE 13.04 K), Tm R2=0.97 (RMSE 16.67 K), Td R2=0.96 (RMSE 23.84 K); E R2=0.94 (RMSE 237.2 MPa), σy R2=0.96 (RMSE 7.1 MPa), σb R2=0.94 (RMSE 9.81 MPa); permeabilities for O2, CO2, N2, H2, He, CH4 all R2≈0.99–1.00 on log10 scale.
- Dataset: 22,731 data points (15,275 homopolymers; 7,456 copolymers). Property ranges included Tg 80–873 K, Tm 215–860 K, Td 291–1173 K; E 0.2–4000 MPa; σb 0.04–200 MPa; εb 0.3–995; μO2 5e-6–1000 barrer, μCO2 1e-6–4756 barrer, etc.
- Search space: 1,373,503 candidate polymers from 540 PHAs and 13 conventional polymers across 11 compositions.
- Selections: 14 PHA-based bioplastic candidates (two per target) identified as potential replacements for seven petroleum-based commodity plastics (PE, PP, PVC, PET, PS, Nylon 6, PEN), which together account for ~75% of annual plastic production in Europe (2019 usage shares given). Radar-chart comparisons show close multi-property profiles between targets and selected candidates.
- Physical trends: Predictions reproduce expected correlations: higher Tg correlates with higher Tm and E; σb correlates with E; μCO2 and μO2 are roughly linearly correlated. Commodity plastics’ properties fall within the predicted distributions of the candidate space, though often in distribution tails, underscoring the challenge of matching multi-property profiles.
- Synthesis insights: All selected PHA-only and PHA–conventional candidates feature aromatic side-chain groups; literature reports exist for biosynthesis of aromatic PHAs and chemical copolymerization routes enabling the proposed hybrids.
Discussion
The study demonstrates that multitask deep neural networks, trained on a large curated dataset, can accurately predict key thermal, mechanical, and gas transport properties across diverse polymer chemistries, enabling rapid screening of over a million PHA-based and hybrid candidates. By combining predictive accuracy with a nearest neighbors search in a multi-property space, the approach identifies candidate bioplastics whose property profiles closely match those of widely used petroleum-based polymers, directly addressing the challenge of functional parity required for adoption. The observed property correlations and chemically meaningful organization in fingerprint space support the physical plausibility of predictions. The identified candidates, especially those with aromatic side chains, align with known biosynthetic and chemical synthesis routes, providing actionable paths toward experimental validation. This data-driven design pipeline thus meaningfully advances the search for sustainable, high-performance bioplastics suitable for packaging and other applications with strict thermomechanical and barrier requirements.
Conclusion
The authors present an informatics-based bioplastic design pipeline that uses multitask deep neural networks to predict 13 key properties for polymers and screens a 1.37 million-member space composed of PHAs and PHA–conventional copolymers. The approach yields promising PHA-based replacements for seven major commodity plastics, together representing over 75% of annual plastic use. The pipeline couples nearest-neighbor property matching with synthesizability considerations and outlines feasible biosynthetic and chemical routes for the proposed materials. More broadly, the work showcases how polymer informatics can guide targeted experiments, reduce trial-and-error, and accelerate the development of sustainable bioplastics compatible with circular economy goals. Future efforts should integrate synthesizability and process optimization directly into the design loop and expand training data to capture processing, morphology, and molecular attributes for even more accurate property predictions.
Limitations
Model limitations stem primarily from training data availability and scope. The predictors do not account for processing and manufacturing conditions, morphology details such as crystallinity, molecular weight and its distribution, topology (e.g., branching), additives, or subtle configurational effects in copolymers (e.g., sequence distribution, chain morphology). These factors, which can influence measured properties, are not currently integrated. Incorporating such information and explicit synthesizability and process optimization criteria would further improve prediction fidelity and practical relevance.
Related Publications
Explore these studies to deepen your understanding of the subject.

