Taylor & Francis Group
Browse

Leveraging Unlabeled Data for Superior ROC Curve Estimation via a Semiparametric Approach

Version 2 2025-02-26, 13:40
Version 1 2025-01-07, 19:00
dataset
posted on 2025-02-26, 13:40 authored by Menghua Zhang, Mengjiao Peng, Yong Zhou

The receiver operating characteristic (ROC) curve is a widely used tool in various fields, including economics, medicine, and machine learning, for evaluating classification performance and comparing treatment effect. The absence of clear and readily labels is a frequent phenomenon in estimating ROC owing to various reasons like labeling cost, time constraints, data privacy and information asymmetry. Traditional supervised estimators commonly rely solely on labeled data, where each sample is associated with a fully observed response variable. We propose a new set of semi-supervised (SS) estimators to exploit available unlabeled data (samples lack of observations for responses) to enhance the estimation precision under the semi-parametric setting assuming that the distribution of the response variable for one group is known up to unknown parameters. The newly proposed SS estimators have attractive properties such as adaptability and efficiency by leveraging the flexibility of kernel smoothing method. We establish the large sample properties of the SS estimators, which demonstrate that the SS estimators outperform the supervised estimator consistently under mild assumptions. Numeric experiments provide empirical evidence to support our theoretical findings. Finally, we showcase the practical applicability of our proposed methodology by applying it to two real datasets.

Funding

This work was supported by the National Key Research and Development Program (2021YFA1000101, 2021YFA1000102, 2021YFA1000104) and the National Natural Science Foundation of China (12301337, 72331005).

History