(1206-C) Absolute quantification of polar analytes using machine learning and StandardCandlesTM as universal calibrators
Tuesday, February 6, 2024
12:00 PM – 1:00 PM EST
Location: Exhibit Halls AB
Abstract: The quantitation of endogenous small molecules plays a pivotal role in a diverse array of applications. For example, the determination of nutrient concentrations serves as a cornerstone for monitoring and optimizing cell culture conditions, driving efficiency improvements, and minimizing waste in various processes. The quantification of key compounds plays a pivotal role in maintaining quality and consistency throughout production processes. Metabolite concentration serves as a valuable tool for tracking the activation or inhibition of specific pathways and holds significant relevance in the realm of research focused on advancing human health and combating disease.
Absolute quantification requires careful consideration of experimental design, sample preparation, analytical methods, and data analysis. Traditional methods have relied upon liquid chromatography-mass spectrometry (LC-MS) to separate individual metabolites so that accurate peak detection and integration can be achieved. In addition to requiring many points across a peak for accurate quantification, the ability to measure the absolute concentration of analytes is limited by the need to procure expensive stable labeled standards or provide laborious calibration curves. We have introduced the ability to harness the power of machine learning (ML) to go directly from raw data to analyte concentrations. The first application of the technology is specifically designed for the absolute quantification of polar analytes important in metabolic processes. We present this novel approach for LC-MS quantitation.
To generate high quality, labelled training data sets for the ML model, we have created an in-house data generation pipeline. The sample sets include polar analytes spiked into a variety of matrices such that the concentration of each analyte in every injection is known. Spectra of the resultant mixtures are acquired on three identically configured LC-MS systems. A QC pipeline for analyzing the data using traditional methods has been developed to ensure data quality. The raw spectra, accompanied by the StandardCandles spectra, are entered into our proprietary deep learning model to directly predict absolute concentration utilizing a state-of-the-art transformer architecture.
Our proprietary data set size has reached over 3,400,000 spectra and includes trained concentrations for more than 150 analytes over 3 orders of magnitude dynamic range. We were able to impute slightly more than 95% of the concentrations from a held-out test . In addition, the resultant model was stable enough to continuously impute concentrations on sample sets acquired using the same method, but on an “unseen” instrument and a novel matrix (i.e., NIST SRM 1950).