In recent years, predictive toxicity models in drug discovery have witnessed remarkable progress, propelled by the availability of extensive molecular data and the rapid evolution of machine learning (ML) techniques. Nonetheless, the absence of a comprehensive benchmark, tailored to the unique complexities of conditional parameters in toxicity such as concentration, pharmacokinetics, etc., has hindered the advancement and effective comparison of novel ML algorithms. In response to this challenge, we present MolToxNet, a versatile and machine learning - ready benchmark dataset curated from diverse sources, including ChEMBL, PubChem, FDA datasets, and other scientific publications. MolToxNet is designed for machine learning researchers aiming to make impactful contributions to real - world drug discovery. MolToxNet spans a diverse array of predictive tasks relevant to real - world drug discovery along with parameters such as human pharmacokinetics, dose, concentration, and cell line data. It integrates in vitro data essential for toxicity prediction, like hERG inhibition and microsomal stability, and also delves deeper into in vivo outcomes, such as cardiotoxicity labels encompassing both arrhythmia and structural heart damage. It offers curated datasets for protein target prediction, emphasizing diverse protein functions beyond just inhibition. MolToxNet also covers pharmacokinetics data such as concentration in plasma. It also incorporates environmental toxicity data, covering the ecological footprint of drug compounds, and a dataset on natural compounds’ protein binding. We set out specific challenges for classification, and regression tasks as well as multi - task and transfer learning models, along with recommended dataset splits for validation that cover various random splits as well as out - of - distribution splits. Each task is tailored to mirror real - world drug discovery challenges and aims to bridge the gap between machine learning predictions and practical drug development outcomes. We provide preprocessed molecular features from a wide range of modalities, such as structural features, cell imaging, and gene expression, which can be used as input features for models. MolToxNet is a collaborative endeavor, pooling insights from both industry and academia, designed to offer ML researchers a benchmark dataset that can be used to make meaningful contributions to real - world drug discovery.