Prediction of matrix metal proteinases-12 inhibitors by machine learning approaches
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Matrix metal proteinases-12 (MMP-12) is a hot pharmaceutical target on the treatment of many human diseases. There’s a crying need for designing and finding new MMP-12 inhibitors. In this work, four machine learning approaches, support vector machine, k-nearest neighbor, C4.5 decision tree, and random forest, were employed to derive statistical models from datasets with well distributed biological activities and predict a compound whether it is a MMP-12 inhibitor. The prediction accuracies of the models are in the range of 96.15–98.08% for sensitivity, 87.23–100.00% for specificity, 91.92–98.99% for the overall prediction accuracy and 0.8401–0.9800 for Matthews correlation coefficient, all producing satisfactory results. By means of diverse feature selection methods, several sets of critical descriptors with key information of inhibitory properties were selected by different models, accelerating the classification for MMP-12 inhibitors and non-inhibitors.
Communicated by Ramaswamy H. Sarma