Chemistry
Data driven discovery of conjugated polyelectrolytes for optoelectronic and photocatalytic applications
Y. Wan, F. Ramirez, et al.
This exciting study by Yangyang Wan, Fernando Ramirez, Xu Zhang, Thuc-Quyen Nguyen, Guillermo C. Bazan, and Gang Lu investigates conjugated polyelectrolytes (CPEs) using machine learning and high-throughput calculations, revealing crucial structural features that can predict unknown properties. Discover promising candidates for innovative applications in optoelectronic devices and sustainable photocatalysts!
~3 min • Beginner • English
Introduction
Conjugated polyelectrolytes (CPEs) are macromolecules with π-conjugated backbones, ionic-functionalized alkyl side chains, and counter ions. Their delocalized π-electrons impart optical and charge-transport properties, while ionic groups ensure solubility in high-dielectric media, notably water. Narrow bandgap CPEs (NBCPEs), formed by alternating electron-rich and electron-poor backbone units, can exhibit intramolecular charge-transfer states and self-doping in aqueous media when bearing anionic sulfonate side groups. Despite significant potential, systematic theoretical/first-principles studies are scarce, leaving unclear how electrostatic forces and individual structural components (backbone donor/acceptor units, side-chain length, ionic group, counter ion) govern optoelectronic properties and polaron stabilization, and hindering rational CPE design.
This work addresses these gaps via a data-centric approach that combines machine learning with high-throughput first-principles calculations to build a CPE database and derive structure–property relationships. Given the high-dimensional design space enabled by modular CPE structures, the study investigates how HOMO, LUMO, and HOMO–LUMO gap (Eg) depend on individual components, identifies key structural features via machine learning, and uses them as descriptors to predict properties of CPEs beyond the database. Finally, guided by these relationships, the study screens and identifies promising CPEs as hole-transport materials for perovskite optoelectronics and as photocatalysts for overall water splitting.
Literature Review
Methodology
Database construction: Exploiting the modularity of CPEs, a first-principles database (>2000 CPEs overall; focus here on 1296 anionic CPEs) was generated by combining 3 donors (CPDT, P, PDT) and 9 acceptors (BT, PT, BBT, FBT, Ph, PhF, Py, PhCN, Tp) into 27 backbones, with 4 alkyl chain lengths (C4, C6, C8, C10), 3 anionic groups (COO−, SO3−, PO3H), and 4 counter cations (Li, Na, K, Cs).
Electronic-structure calculations: Density functional theory (DFT) with plane-wave basis sets using VASP was employed. PAW pseudopotentials were used; PBE and HSE06 functionals were considered. For throughput, PBE was used across the database to capture trends, and HSE06 was applied to a subset; screening used PBE for Eg and HSE06 for HOMO with a single conjugation unit, based on benchmarking that PBE underestimates Eg (HSE06/PBE ratio ~1.4) and HOMO (ratio ~1.1). CPEs were modeled primarily as oligomers with a single repeating conjugation unit in cubic cells with 15 Å vacuum in all directions. Initial alkyl chains were set to fully extended trans conformations; geometries were relaxed to 0 K equilibrium (energy and force thresholds: 1e−5 eV and 0.02 eV/Å). Select ab initio molecular dynamics at 300 K confirmed structural stability. van der Waals corrections (PBE+D3) had negligible impact on energetics and structures.
Feature engineering and analysis: Initial features for machine learning included donor/acceptor frontier levels (HOMOD, LUMOD, HOMOA, LUMOA), alkyl chain length (L), electronegativity of anionic group (χ−) and counter cation (χ+), and the anionic group coordination number (CN). Pearson correlation coefficients (ρ) quantified linear relationships, and Random Forest Regression (RFR) provided non-linear feature importance (Q). Subsequent, structure-linked (revised) features replaced donor/acceptor frontier levels: degree of unsaturation of donor and acceptor (DUD, DUA; number of rings and π-bonds), differential electronegativity between heteroatom and carbon on the ring (ΔD1, ΔA1), and between substituent and hydrogen (ΔD2, ΔA2; ΔD2 constant here and omitted). Ionic functionality descriptors (CN, L, χ−, χ+) were retained; χ− was later dropped for negligible influence.
Property prediction: Support Vector Regression (SVR, scikit-learn) with 10-fold cross-validation and grid search was used to predict HOMO and LUMO of CPEs. Performance metrics included mean absolute percent error (MAPE), root-mean-square error (RMSE), and R2. A split of 1046 training and 250 test samples was used; an external verification set of additional CPEs (beyond the 1296) further assessed generalization.
Screening criteria: For hole-transport materials (HTMs) in perovskite devices, candidates were required to be more stable than CPE-K (atomization energy < −0.239 eV), show energy-level alignment with perovskite and ITO (HOMO between perovskite VBM and ITO VBM; LUMO lower than perovskite conduction band ~ −3.7 eV) to block electrons, and have hole reorganization energy λ < 0.186 eV (Marcus theory). For photocatalysts for overall water splitting, candidates required HOMO < −5.87 eV and LUMO > −4.24 eV to straddle OER/HER redox with ~0.2 eV overpotentials, Eg < 2.4 eV for solar absorption, and atomization energy lower than CPE-K.
Code and data: Structure-generation scripts at https://cpegenome.com; data in article/Supplementary Information and database (https://cpegenome.com); additional raw data available from the corresponding author.
Key Findings
- Database and variability: Systematic PBE calculations over 1296 anionic CPEs show that within a given backbone, HOMO/LUMO vary by ~0.5 eV whereas Eg varies by ~0.1 eV, indicating Eg is primarily backbone-determined while ionic functionality strongly modulates HOMO/LUMO. Across all CPEs studied, HOMO ranges from −5.77 to −4.17 eV and LUMO from −3.71 to −1.77 eV; the lowest transport gap observed is 1.32 eV.
- Side-chain length effects: For neutral CPEs (ionic functionality removed, CPE*), alkyl chain length (e.g., C4 vs C6; also C8 vs C10) has negligible impact on HOMO/LUMO/Eg.
- Donor/acceptor correlations: HOMO of CPE* positively correlates with donor HOMO (HOMOD); LUMO of CPE* correlates with acceptor LUMO (LUMOA). Eg shows a positive but weaker correlation with (LUMOA − HOMOD), consistent with charge density localization (LUMO on acceptor; HOMO delocalized on donor and acceptor) and Bader charge analyses (hole: +0.63e donor/+0.37e acceptor; electron: −0.31e donor/−0.69e acceptor for CPE-K*).
- Electrostatic modulation: For fixed backbone/ionic group, HOMO decreases (becomes deeper) with increasing chain length and with increasing counter-ion electronegativity; neutral CPE* defines the asymptotic limit as chain length → ∞ (electrostatics vanish). The anionic group’s proximity leads to stronger π-orbital polarization than the counter ion. Notable exceptions occur when small, mobile counter ions (Li, Na) approach the backbone (especially with short chains and SO3−), reversing trends (HOMO below HOMO*). LUMO shows similar qualitative trends; Eg is weakly affected by electrostatics.
- Feature importance via ML: RFR and Pearson analyses indicate Eg depends primarily on backbone features: HOMOD, LUMOA (dominant), with additional influence from LUMOD and HOMOA; HOMOD is anti-correlated with Eg. LUMO of CPE depends almost exclusively on LUMOA; donor features have negligible influence. HOMO depends comparably (with opposite signs) on HOMOD, LUMOD, and HOMOA, reflecting donor–acceptor interaction. Among ionic descriptors, anionic group coordination number (CN) is more important than side-chain length (L), χ−, and χ+; χ− has negligible influence and is excluded subsequently.
- Revised, structure-linked features: Using DUD, DUA, ΔD1, ΔA1, ΔA2 with ionic descriptors, correlations show: to lower Eg, decrease donor degree of unsaturation (DUD) and/or increase acceptor unsaturation (DUA), and increase electronegativity of heteroatoms on the ring (ΔA1) and substituents (ΔA2). HOMO increases with lower DUD and substituent electronegativity on donor, and higher heteroatom electronegativity on the ring. LUMO is governed by acceptor features; raising LUMO entails reducing acceptor unsaturation and electronegativity of ring heteroatoms and substituents.
- Predictive performance: SVR predictions of HOMO/LUMO achieve test-set MAPE ≤2.07%, RMSE ≤0.07 eV, and R2 ≥0.97 (1046/250 train/test). On an external verification set, HOMO MAPE ~1.16% (RMSE ~0.09 eV, R2 ~0.91) and LUMO MAPE ~3.63% (RMSE ~0.14 eV, R2 ~0.90), demonstrating generalization.
- Screening outcomes:
• Hole-transport materials (HTMs): 72 candidates satisfied stability, alignment, and λ criteria; five top candidates by lowest λ include CPDT-PhF-C8-SO3Na (HOMO −5.136 eV, LUMO −2.845 eV, λ 0.131 eV) and related variants. Longer alkyl chains generally reduce hole reorganization energy compared to CPE-K (C4), suggesting increased side-chain length can improve HTM performance.
• Photocatalysts for overall water splitting: 17 candidates identified meeting redox straddling, Eg < 2.4 eV, and stability; top five include P-FBT-C4-SO3Li (HOMO −5.910 eV, LUMO −3.881 eV, Eg 2.029 eV). All promising photocatalysts feature P as the donor; engineering acceptor units (increasing ΔA1 and ΔA2) enables deeper HOMO and suitable gaps for overall splitting.
Discussion
The study establishes clear, quantitative structure–property relationships in conjugated polyelectrolytes by disentangling backbone and ionic-functionality effects. It shows that while the backbone dictates the transport gap Eg via donor/acceptor electronic structures, electrostatic interactions from pendant ionic groups and counter ions significantly tune HOMO and LUMO, thereby enabling energy-level alignment for specific applications. Electrostatic polarization provides a unifying physical explanation for observed trends, including chain-length and counter-ion effects and exceptions when small, mobile counter ions approach the backbone.
Machine learning identifies the most influential, structure-tied features and demonstrates accurate prediction of frontier orbital energies for unseen CPEs, enabling rapid screening. The derived guidelines directly inform design: selecting donor/acceptor pairs to set Eg and LUMO (acceptor-driven), while adjusting donor features and ionic architecture to tune HOMO and reduce reorganization energy.
These insights were leveraged to identify practical candidates: dozens of HTMs with appropriate alignment and reduced λ for perovskite devices, and multiple photocatalysts that straddle water redox potentials with suitable gaps. The results address the initial challenge of navigating the vast CPE design space and provide actionable paths for rational synthesis and device integration.
Conclusion
A first-principles, data-centric framework integrating high-throughput DFT and machine learning was developed to map the high-dimensional structural space of conjugated polyelectrolytes to key optoelectronic properties. The work: (1) constructed a comprehensive anionic CPE database; (2) established that Eg is backbone-dominated while HOMO/LUMO are tunable via electrostatics from ionic functionalities; (3) identified critical structural descriptors—both frontier-orbital-based and directly structure-linked—that govern HOMO, LUMO, and Eg; (4) built accurate ML models (SVR) for predicting HOMO/LUMO of unseen CPEs; and (5) discovered promising CPE candidates as HTMs for perovskite devices and as photocatalysts for overall water splitting, along with practical design rules (e.g., increasing side-chain length to reduce λ; engineering acceptor features to meet redox straddling).
Future research should expand and incorporate experimental datasets to improve model fidelity, and develop advanced models capturing solvent effects, defects, interfacial wettability, polaron transport, and excited-state dynamics to further refine screening and design.
Limitations
- Data and model scope: While extensive, the database focuses on anionic CPEs and primarily oligomers with a single conjugation unit; infinite-chain effects and morphology are only partially explored. PBE underestimates Eg; scaling and selective HSE06 mitigate but do not replace full hybrid-level coverage.
- Structural sampling: Initial conformations assume extended trans side chains; local minima and finite-temperature conformational diversity are only partly addressed via limited AIMD.
- Machine learning limitations: Feature correlations include non-linear effects; although RFR/SVR help, models rely on computed data and limited experimental validation. Ionic-group electronegativity was found negligible in this dataset but may matter in broader chemistries.
- Device-relevant properties: Screening criteria approximate complex processes; solvent interactions, defects, interfacial energetics/wettability, polaron lifetimes, and charge-transport couplings are not fully captured and may affect real-world performance.
Related Publications
Explore these studies to deepen your understanding of the subject.

