(1011-D) Advancing Phenotypic Drug Hunting: AI for Biologists
Tuesday, February 6, 2024
2:00 PM – 3:00 PM EST
Location: Exhibit Halls AB
There is a growing interest in adopting image-based phenotypic profiling for target and drug discovery processes. Such high content approaches yield rich phenotypic data that can reveal critical holistic insights into mechanisms of action, or toxicity of candidate drugs. Much of the growth has been driven by the use of Cell Painting, a standardized high content profiling approach.
The method has been adopted so widely that the JUMP (Joint Undertaking in Morphological Profiling) Cell Painting (CP) consortium has been established to generate the largest publicly available reference Cell Painting dataset, with the aim to create a new data-driven approach to drug discovery. This high content imaging dataset (~3 million images, ~75 million single cells, 5000+ features) has been generated using 140,000 different conditions with small molecules, CRISPRs and ORFs.
The JUMP-CP dataset has the potential to be an invaluable resource for drug discovery but the complexity and the size of the data make it challenging for users outside of the consortium. The three main barriers are: 1) accessing the images, 2) image analysis, and 3) downstream numerical analysis. To alleviate these challenges, we developed an integrated cloud-based platform with no-code configuration to allow researchers to quickly translate the vast amounts of image data to actionable biological insights.
Our cloud platform includes image storage and registration, as well as a distributed version of CellProfiler (CPUltra) that leverages massively scalable parallel processing to accelerate image analyses, and downstream numeric data analyses. This cloud-based method greatly accelerated the analytics process that is otherwise time-consuming and computationally intensive, thereby allowing the evaluation of complex datasets such as the JUMP-CP. To validate our methods, we applied the original JUMP-CP CellProfiler pipeline to images from sixty 384-well plates.
Numeric outputs were then compared to the original JUMP-CP numeric data using phenotypic Euclidean distance scores of compounds. CPUltra output demonstrated significant positive correlation with JUMP-CP output with a coefficient of determination (R2) value of 0.823 with both sources 2 and 10 combined. Individually, source 2 and source 10 showed consistent positive correlations between CPUltra results and the JUMP-CP data (R2 = 0.801 and 0.886, respectively). We have also developed a deep learning-based label-free segmentation tool to standardize feature extraction. Studies are underway to investigate outcomes from our neural network feature extraction method and draw comparison with the original JUMP-CP numeric results.
To further maximize the utility of the JUMP-CP dataset, we built multiple machine-learning models based on numeric data and evaluated their predictive performance across different reagent or target classes. Taken together, we show the importance and feasibility of leveraging the cloud computing infrastructure, and how they help make data-based drug discovery accessible to biologists.