Improving the biomarker diagnostic capacity via functional transformations

The use of the area under the receiver-operating characteristic, ROC, curve (AUC) as an index of diagnostic accuracy is overwhelming in fields such as biomedical science and machine learning. It seems that a larger AUC value has become synonymous with a better performance. The functional transformation of the marker values has been proposed in the specialized literature as a procedure for increasing the AUC and therefore the diagnostic accuracy. However, the classification process is based on some regions (classification subsets) which support the decision made; one subject is classified as positive if its marker is within this region and classified as negative otherwise. In this paper we study the capacity of improving the classification performance of univariate biomarkers via functional transformations and the impact of this transformation on the final classification regions based on a real-world dataset. Particularly, we consider the problem of determining the gender of a subject based on the Mode frequency of his/her voice. The shape of the cumulative distribution function of this characteristic in both the male and the female groups makes the resulting classification problem useful for illustrating the differences between having useful diagnostic rules and obtaining an optimal AUC value. Our point is that improving the AUC by means of a functional transformation can produce classification regions with no practical interpretability. We propose to improve the classification accuracy by making the selection of the classification subsets more flexible while preserving their interpretability. Besides, we provide different graphical approximations which allow us a better understanding of the classification problem.