The Challenges of building machine learning models from DNA-encoded library data

Monday, February 5, 2024

10:00 AM - 10:30 AM EST

Room/Location: 204AB

RS

Robert Stanton, PhD

Pfizer, Massachusetts, United States

Abstract: DNA-encoded libraries (DELs) can consist of billions of tagged compounds which can be evaluated for binding against a protein of interest in a single experiment. They have emerged as an effective tool for small molecule hit discovery, with successful DEL selections against a variety of protein families, including first-in-class targets. While DEL selections often produce promising hits, the results can contain significant noise due to various experimental factors. Even with known chemistry and available monomers on-DNA/off-DNA validation and follow-up by project teams can require significant time and resources. The workflow of using machine learning models to extract meaningful SAR from DEL selection data to then use for rapid searches of commercial or proprietary compound collections has gained in popularity with several recent publications. In this talk, we will share our recent survey of machine learning models and descriptors for predictions on DEL selection data, emphasizing the challenges that arise with this unique data type.