Bayesian screening for feature selection
Biomedical applications such as genome-wide association studies screen large databases with high-dimensional features to identify rare, weakly expressed, and important continuous-valued features for subsequent detailed analysis. We describe an exact, rapid Bayesian screening approach with attractive diagnostic properties using a Gaussian random mixture model focusing on the missed discovery rate (the probability of failing to identify potentially informative features) rather than the false discovery rate ordinarily used with multiple hypothesis testing. The method provides the likelihood that a feature merits further investigation, as well as distributions of the effect magnitudes and the proportion of features with the same expected responses under alternative conditions. Important features include the dependence of the critical values on clinical and regulatory priorities and direct assessment of the diagnostic properties.