Medicine and Health
Exploring protein hotspots by optimized fragment pharmacophores
D. Bajusz, W. S. Wade, et al.
Discover how a team led by Dávid Bajusz and colleagues has harnessed fragment-based drug design to create a pilot library, SpotXplorer0, that successfully identifies hits for challenging targets like SETD2 and SARS-CoV-2. This innovative approach integrates pharmacophores with protein hotspot theory, paving the way for future drug development.
~3 min • Beginner • English
Introduction
Fragment-based drug discovery (FBDD) screens small, typically polar compounds that often bind weakly but preferentially target protein hotspots, where a few optimal-geometry hydrogen bonds and key interactions drive binding. Overlap with structure-based pharmacophores at primary hotspots yields robust binding. Despite FBDD’s success, questions remain regarding target specificity and fragment promiscuity; many fragments bind multiple targets, yet conserved hotspots and pharmacophore patterns across target classes imply a limited set of distinct binding pharmacophores. The authors hypothesize that identifying and covering the minimal set of fragment pharmacophores representing known hotspots can enable efficient, small fragment libraries for broad hit discovery.
In this work, they analyze critical interactions at protein hotspots from protein–fragment complex structures in the PDB to derive fragment pharmacophores. Using these, they design a minimal, diverse library (SpotXplorer0) of commercially available fragments that covers most experimentally validated binding pharmacophores, validate it on established target classes, and probe challenging targets including SETD2 and SARS-CoV-2 3CLPro and NSP3 macrodomain.
Literature Review
Methodology
- Mining experimental structures and hotspot identification: >3300 PDB entries with fragment-sized ligands (10–16 heavy atoms) were filtered to remove covalent labels, sugars, buffers, and additives. FTMap (ATLAS) was used to identify ligands bound at protein hotspots by mapping multiple probe clusters and defining regions where many probes cluster as hotspots.
- Pharmacophore extraction: For each hotspot-bound protein–fragment complex, Schrödinger ePharmacophore extracted up to four pharmacophore features (A, D, H, N, P, R) with the largest energetic contributions (via Glide XP score without changing binding mode).
- Two-level clustering to a non-redundant set: 3584 pharmacophore models were clustered: (1) Level 1 by identical feature sets (141 clusters, e.g., DRR). (2) Within each level 1 set, 3D alignment and hierarchical clustering by pairwise RMSD (complete linkage, 2 Å cutoff) yielded level 2 clusters representing specific 3D arrangements (e.g., DRR_0). In total, 425 non-redundant hotspot-binding pharmacophores were derived; the cluster-centroid model represents each.
- Submodel handling and fingerprinting: A pharmacophore submodel is a spatial subset of a larger model; molecules matching a larger model trivially match its submodels. To ensure smaller models are independently represented, two 425-bit fingerprints were created per molecule: one with all matched models, and one with submodel bits zeroed.
- Library assembly and optimization: Vendor fragment collections (including BioBlocks and others) were filtered for size, rotatable bonds, physicochemical properties, and removal of PAINS/problematic features. Molecules were annotated with 2- and 3-point pharmacophore matches. An optimization algorithm selected 96 molecules: initial MaxMin selection on pharmacophore fingerprint distances; then iterative swaps to maximize an objective combining (i) chemical diversity (low mean pairwise fingerprint similarity of molecules), (ii) pharmacophore diversity (low similarity across represented models), and (iii) overall pharmacophore coverage (fraction of models with at least one matching molecule). The final few selections specifically filled missing/underrepresented pharmacophores.
- Experimental validation on established targets: Biochemical screening of the 96-compound SpotXplorer0 library against GPCRs (5-HT1A, 5-HT6, 5-HT7) via cell-based radioligand binding assays (HEK293; hits defined as ≥50% inhibition at 10 µM), and serine proteases Factor Xa and thrombin using chromogenic assays (Ki determined in follow-up).
- Pharmacophore retrieval benchmarking: For each target, fragment-sized ligands with ≥ mM activity from ChEMBL were processed with the same pharmacophore workflow to define the “known” target pharmacophore set. Pharmacophores retrieved by SpotXplorer0 hits were compared to these sets to compute retrieval percentages for 2- and 3-point models.
- Structural comparison for proteases: All holo structures for thrombin (214 chains) and Factor Xa (118 chains) were analyzed to extract experimental pharmacophores; overlap with the non-redundant set and with SpotXplorer0 hits was quantified. Novel pharmacophore arrangements represented by hits but absent from PDB structures were enumerated.
- Challenging targets and crystallographic screening: SETD2 enzymatic chemiluminescence assay identified inhibitors; cell viability assays (MOLM-13 and MV4-11) assessed cellular effects of SX045. For SARS-CoV-2 3CLPro and NSP3 macrodomain, high-throughput crystallographic fragment screening at Diamond’s XChem was performed by crystal soaking (ECHO acoustic dispensing; ~1–3 h incubation; data to ~1.8 Å for 3CLPro and ~1.1 Å for NSP3). Hits were identified using PanDDA; select compounds were tested for enzymatic inhibition and antiviral activity in infected Vero E6 cells (viral RNA by ddPCR). PDB depositions include 5RHD, 5S4F, 5S4G, 5S4H, 5S4I, 5S4J.
Key Findings
- Non-redundant pharmacophore space: Analysis of hotspot-bound fragments in the PDB yielded 425 distinct, non-redundant binding pharmacophores (max 4 features), supporting that fragment–hotspot interactions are represented by a limited set of recurring pharmacophores.
- Library performance and coverage: The 96-compound SpotXplorer0 library covers 76% of unique 2-point and 94% of unique 3-point pharmacophores in the non-redundant set, with high chemical and pharmacophore diversity.
- Retrieval of known target pharmacophores (ChEMBL benchmarking): Despite ChEMBL listing up to ~11× more fragment ligands, SpotXplorer0 hits retrieved most known pharmacophores for each target. Reported retrieval percentages (2-pt / 3-pt): 5-HT1A: 80.0% / 51.5%; 5-HT6: 100% / 87.5%; 5-HT7: 64.3% / 46.4%; Thrombin: 78.8% / 54.8%. Factor Xa not computed due to too few fragments. Average across targets: 80.8% (2-pt) and 60.0% (3-pt). Hits showed limited overlap across targets, supporting selectivity.
- Protease structural concordance: Of X-ray validated pharmacophores, 86.7% (thrombin) and 85.0% (Factor Xa) were represented by SpotXplorer0 hits. For thrombin fragment complexes (58), 90.0% of validated pharmacophores were identified by screening. SpotXplorer0 thrombin hits revealed 20 additional 2-point and 39 additional 3-point pharmacophores beyond those seen in PDB fragment complexes, suggesting novel binding poses.
- Challenging targets achieved hits:
• SETD2: Two fragments inhibited activity (SX045 IC50 300 µM; SX084 IC50 500 µM). SX045 reduced leukemia cell viability (EC50 333 µM in MOLM-13; 400 µM in MV4-11).
• SARS-CoV-2 3CLPro: Fragment SX013 (PDB 5RHD) bound centrally across subsites; enzymatic IC50 31 µM; antiviral EC50 304 µM in Vero E6.
• SARS-CoV-2 NSP3 macrodomain: Five crystallographic hits (SX003, SX005, SX048, SX051, SX054) occupied adenine or proximal ribose sub-sites of ADP-ribose; antiviral EC50 values in high µM, as low as 136 µM (SX051). Fragments recapitulate/mimic ADP-ribose interactions and provide vectors for growth/merging.
- Topological feature alignment: Distributions of ring types, hybridization, and H-bond donors/acceptors in SpotXplorer0 hits matched those of active fragments from ChEMBL for the same targets, indicating the library captures key chemotypes associated with binding.
Discussion
The study demonstrates that the practical pharmacophore space of fragment binding to protein hotspots is uneven and limited, due to evolutionary conservation and the dominance of a few high-value interactions. By constructing a protein-based, pharmacophore-optimized fragment library derived from experimentally observed protein–fragment complexes, SpotXplorer0 efficiently maps hotspot interactions across diverse targets. It retrieves the majority of known pharmacophores for established GPCRs and proteases and uncovers new pharmacophore arrangements, indicating potential novel binding poses. The successful identification of cellularly active fragments against SETD2, and crystallographically confirmed and antiviral fragments against SARS-CoV-2 3CLPro and NSP3 macrodomain, further validates the approach. Principal component analysis of pharmacophore fingerprints suggests low redundancy (first 10 PCs explain 45% variance without submodels; 58% with submodels), supporting diverse coverage. Compared to ligand-based library designs of similar size, SpotXplorer0 achieves substantially higher coverage of unique 2- and 3-point pharmacophores, emphasizing the value of anchoring library design in experimentally validated protein-based pharmacophores. The approach is readily maintainable by updating with newly released structures and can generalize to other structure sources.
Conclusion
This work introduces SpotXplorer, a protein hotspot-guided, pharmacophore-optimized strategy for fragment library design. Mining PDB protein–fragment complexes yielded a compact non-redundant set of 425 pharmacophores, from which a 96-compound library (SpotXplorer0) was assembled to maximize pharmacophore coverage and diversity. The library effectively recovers known pharmacophores for established targets and identifies hits for challenging targets, including SETD2 and SARS-CoV-2 3CLPro and NSP3 macrodomain, with structural validation and initial cellular activities. These results show that minimal, pharmacophore-centric libraries can serve as broadly applicable starting points for drug discovery. Future work includes expanding and updating the pharmacophore set with new structural data, addressing protonation/tautomeric state uncertainties with improved protocols, and optimization of the identified fragment hits via growing/merging guided by the pharmacophore vectors.
Limitations
- Dependence on available experimental structures: The non-redundant pharmacophore set is limited to pharmacophores observed in publicly available PDB structures; unseen interactions may be underrepresented until new data are added.
- Physicochemical state assignment: Potential inaccuracies in assigning protonation states, tautomers, and charges, as well as protein-induced polarization effects, may affect pharmacophore extraction and matching despite the implemented pKa- and H-bond–aware protocols.
- Target coverage biases: Structural and ligand data are richer for some target classes (e.g., proteases) than others, potentially biasing pharmacophore frequencies and library composition.
- Early-stage activities: Many identified fragments display weak to moderate potencies (high µM to mM), requiring substantial optimization for therapeutic relevance.
Related Publications
Explore these studies to deepen your understanding of the subject.

