10.6084/m9.figshare.8398058.v2
Sarah E. Romanes
Sarah E.
Romanes
John T. Ormerod
John T.
Ormerod
Jean Y. H. Yang
Jean Y. H.
Yang
Diagonal Discriminant Analysis With Feature Selection for High-Dimensional Data
Taylor & Francis Group
2019
Asymptotic properties of hypothesis tests
Classification
Feature selection
Latent variables
Likelihood ratio tests
Multiple hypothesis testing
2019-08-16 15:39:38
Dataset
https://tandf.figshare.com/articles/dataset/Diagonal_Discriminant_Analysis_with_Feature_Selection_for_High_Dimensional_Data/8398058
<p>We introduce a new method of performing high-dimensional discriminant analysis (DA), which we call multiDA. Starting from multiclass diagonal DA classifiers which avoid the problem of high-dimensional covariance estimation we construct a hybrid model that seamlessly integrates feature selection components. Our feature selection component naturally simplifies to weights which are simple functions of likelihood ratio test statistics allowing natural comparisons with traditional hypothesis testing methods. We provide heuristic arguments suggesting desirable asymptotic properties of our algorithm with regard to feature selection. We compare our method with several other approaches, showing marked improvements in regard to prediction accuracy, interpretability of chosen features, and fast run time. We demonstrate such strengths of our model by showing strong classification performance on publicly available high-dimensional datasets, as well as through multiple simulation studies. We make an R package available implementing our approach. <a href="https://doi.org/10.1080/10618600.2019.1637748" target="_blank">Supplementary materials</a> for this article are available online.</p>