A semi-parametric Bayesian approach for detection of gene expression heterosis with RNA-seq data
Heterosis refers to the superior performance of a hybrid offspring over its two inbred parents. Although heterosis has been widely observed in agriculture, its molecular mechanism is not well studied. Recent advances in high-throughput genomic technologies such as RNA sequencing (RNA-seq) facilitate the investigation of heterosis at the gene expression level. However, it is challenging to identify genes exhibiting heterosis using RNA-seq data because high-dimension of hypotheses tests are conducted with limited sample size. Furthermore, detecting heterosis genes requires testing composite null hypotheses involving multiple mean expression levels instead of testing simple null hypotheses as in differential expression analysis. In this manuscript, we formulate a statistical model with parameters directly reflecting heterosis status, and develop a powerful test to detect heterosis genes. We employ a Bayesian framework where the RNA-seq count data are modeled through a Poisson-Gamma mixture with Dirichlet processes as priors for the distributions of the parameters of interest, the fold changes between each parent and the hybrid. Markov Chain Monte Carlo sampling with Gibbs algorithm is utilized to provide posterior inference to detect heterosis genes while controlling false discovery rate. Simulation results demonstrate that our proposed method outperformed other methods utilized to detect gene expression heterosis.