Taylor & Francis Group
Browse
uasa_a_1672556_sm9239.zip (8.31 kB)

Cross-Validation With Confidence

Download (8.31 kB)
Version 3 2021-09-29, 14:33
Version 2 2019-10-31, 21:40
Version 1 2019-10-14, 14:03
dataset
posted on 2021-09-29, 14:33 authored by Jing Lei

Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.

Funding

Jing Lei’s research is partially supported by NSF grants DMS-1407771 and DMS-1553884.

History