## Detecting differentially expressed genes of heterogeneous and positively skewed data using half Johnson’s modified *t*-test

*Background*: Microarray technology allows simultaneously detecting thousands of genes within one single experiment. The Student’s *t*-test (for a two-sample situation) can be used to compare the mean expression of a gene, taken from replicate arrays, to detect differential expression under the conditions being studied, such as a disease. However, a general statistical test may have insufficient power to correctly detect differentially expressed genes of heterogeneous and positively skewed data. *Methods*: Here we define a differentially expressed gene as with significantly different expression in means, variances, or both between the two groups of microarray. Monte Carlo simulation shows that the “half Johnson’s modified *t*-test” maintains quite accurate type I error rates in normal and non-normal distributions. And the half Johnson’s modified *t*-test was more powerful than the half Student’s *t*-test overall when the ratio of standard deviations between case and control groups is greater than 1. *Results*: Analysis of a colon cancer data shows that when the false discovery rate (FDR) is controlled at 0.05, the half Johnson’s modified *t*-test can detect 429 differentially expressed genes, which is larger than the number of differentially expressed genes (i.e. 344) detected by the half Student’s *t*. To target 100 priority genes, the half Johnson’s modified *t* only set FDR to 4.28 × 10^{−8}, but for the half Student’s *t*, it is set to 5.39 × 10^{−4}. *Conclusions*: The half Johnson’s modified *t*-test is recommended for the detection of differentially expressed genes in heterogeneous and ONLY positively skewed data.