Graph-Assisted Inverse Regression for Count Data and Its Application to Sequencing Data
Multivariate count data, such as sequencing reads in genomics, are often connected to a clinical phenotype of interest. We develop a flexible framework for dimension reduction in regression, with predictors that are correlated counts, by modeling the conditional distribution of the predictors, given the response, using a pairwise Poisson graphical model. This new framework, called network-based inverse regression for counts, allows us to derive a sufficient reduction of the predictors, while adjusting for the dependence structure among them. We propose a regularized criterion for estimating both the reduction structure and the network structure. The estimation algorithm can be implemented efficiently on a parallel computer. We also introduce an adaptive version and a sparse variant of the proposed procedure. The methods are evaluated on simulated data and are applied to a gut microbiome sequencing dataset. Supplementary materials for this article are available online.