10.6084/m9.figshare.3466652.v1 X.-W. Zhu X.-W. Zhu Y.-J. Xin Y.-J. Xin Q.-H. Chen Q.-H. Chen Chemical and <i>in vitro</i> biological information to predict mouse liver toxicity using recursive random forests Taylor & Francis Group 2016 Recursive random forests liver toxicity QSAR hybrid model high-throughput screening 2016-06-29 08:32:35 Dataset https://tandf.figshare.com/articles/dataset/Chemical_and_i_in_vitro_i_biological_information_to_predict_mouse_liver_toxicity_using_recursive_random_forests/3466652 <p>In this study, recursive random forests were used to build classification models for mouse liver toxicity. The mouse liver toxicity endpoint (67 toxic and 166 non-toxic) was a composition of four <i>in vivo</i> chronic systemic and carcinogenic toxicity endpoints (non-proliferative, neoplastic, proliferative and gross pathology). A multiple under-sampling approach and a shifted classification threshold of 0.288 (non-toxic < 0.288 and toxic ≥ 0.288) were used to cope with the unbalanced data. Our study showed that recursive random forests are very efficient in variable selection and for the development of predictive <i>in silico</i> models. Generally, over 95% redundant descriptors could be reduced from modelling for all the chemical, biological and hybrid models in this study. The predictive performance of chemical models (CCR of 0.73) is comparable with hybrid model performance (CCR of 0.74). Descriptors related to the octanol–water partition coefficient are vital for model performance. The <i>in vitro</i> endpoint of CYP2A2 played a key role in the development and interpretation of hybrid models. Identifying high-throughput screening assays relevant to liver toxicity would be key for improving <i>in silico</i> models of liver toxicity.</p>