Chemical and in vitro biological information to predict mouse liver toxicity using recursive random forests

dataset

posted on 2016-06-29, 08:32 authored by X.-W. Zhu, Y.-J. Xin, Q.-H. Chen

In this study, recursive random forests were used to build classification models for mouse liver toxicity. The mouse liver toxicity endpoint (67 toxic and 166 non-toxic) was a composition of four in vivo chronic systemic and carcinogenic toxicity endpoints (non-proliferative, neoplastic, proliferative and gross pathology). A multiple under-sampling approach and a shifted classification threshold of 0.288 (non-toxic < 0.288 and toxic ≥ 0.288) were used to cope with the unbalanced data. Our study showed that recursive random forests are very efficient in variable selection and for the development of predictive in silico models. Generally, over 95% redundant descriptors could be reduced from modelling for all the chemical, biological and hybrid models in this study. The predictive performance of chemical models (CCR of 0.73) is comparable with hybrid model performance (CCR of 0.74). Descriptors related to the octanol–water partition coefficient are vital for model performance. The in vitro endpoint of CYP2A2 played a key role in the development and interpretation of hybrid models. Identifying high-throughput screening assays relevant to liver toxicity would be key for improving in silico models of liver toxicity.