(1321-B) Informatics resources and databases to support identification of E3 ligase ligands using machine learning
Monday, February 5, 2024
2:00 PM – 3:00 PM EST
Location: Exhibit Halls AB
Abstract: E3 ligases are enzymes involved in ubiquitin-mediated protein degradation. Predicting compounds (or ligands) that bind selectivity to these E3 ligases can assist in the design of novel Proteolysis Targeting Chimeras (PROTACs), a targeted protein degradation-induced drug design. The discovery of such PROTACs have gained traction in recent years. In recent years, a large number of methods for the identification of PROTACs have been established by these methods are specific to certain E3 ligase classes. Furthermore, only a few combinatorial approaches for the generation of PROTACs exist specifically for well-studied targets. To tackle both these issues, we built a simple and effective multi-class machine learning (ML) model.
We gathered a dataset of known E3 ligands (with their respective targets) found in literature and patents as the training set for our model. To achieve this, we merged data from three PROTAC resources: PROTAC-DB 2.0, PROTACpedia and an internally curated subset of Proximity Degraders (Evolvus Proximers database - https://www.evolvus.com/PD.html ), where the E3 ligands bound to active PROTACs were structurally identified and assigned. Additionally, we expanded the chemical space of E3 ligands for two targets (DDB1 and DCAF1), which were missing from the previously mentioned PROTAC collections. Together, this yielded a total of 643 unique ligands. To prepare inputs for our ML model, we converted the E3 ligands into five different molecular fingerprints. Using these fingerprints, a multi-class XGBoost classification model was constructed. All models exhibited strong performance, achieving an accuracy of over 90%. The model trained with 3D pharmacophoric fingerprints, Extended Reduced Graph (ErG), outperformed the other with an accuracy of 94% and Cohen kappa score of 0.88. Moreover, for each of the fingerprint-based models, we assessed the most dominant contributing fingerprint bits allowing for deciphering chemical features relevant for building future E3 ligands for targets.
To demonstrate the applicability of our ErG-based XGBoost model, we tested a degrader library, the Asinex molecular degrader collection, which contained about 1,257 compounds. The results revealed that this commercial collection was heavily skewed toward probable CRBN binders (66%) and with only 2% possibly selecting VHL. Surprisingly, some 32% is addressing the mixed “other” E3 ligase class. Our analysis demonstrated that commercial libraries have the potential to deliver novel TPD candidates, albeit these potentialities are confined if the compound selection is oriented toward only limited subsets of E3 ligases. Ref: https://pubs.acs.org/doi/full/10.1021/acsomega.3c02803