High-dimensional QSAR modelling using penalized linear regression model with L1/2-norm

Algamal, Z. Y.; Lee, M. H.; Al-Fakih, A. M.; Aziz, M.

doi:10.6084/m9.figshare.3830025.v1

gsar_a_1228696_sm2614.docx (1.92 MB)

High-dimensional QSAR modelling using penalized linear regression model with L_1/2-norm

journal contribution

posted on 2016-09-15, 00:13 authored by Z. Y. Algamal, M. H. Lee, A. M. Al-Fakih, M. Aziz

In high-dimensional quantitative structure–activity relationship (QSAR) modelling, penalization methods have been a popular choice to simultaneously address molecular descriptor selection and QSAR model estimation. In this study, a penalized linear regression model with L_1/2-norm is proposed. Furthermore, the local linear approximation algorithm is utilized to avoid the non-convexity of the proposed method. The potential applicability of the proposed method is tested on several benchmark data sets. Compared with other commonly used penalized methods, the proposed method can not only obtain the best predictive ability, but also provide an easily interpretable QSAR model. In addition, it is noteworthy that the results obtained in terms of applicability domain and Y-randomization test provide an efficient and a robust QSAR model. It is evident from the results that the proposed method may possibly be a promising penalized method in the field of computational chemistry research, especially when the number of molecular descriptors exceeds the number of compounds.