An Extension of Generalized Linear Models to Finite Mixture Outcome Distributions

Finite mixture distributions arise in sampling a heterogeneous population. Data drawn from such a population will exhibit extra variability relative to any single subpopulation. Statistical models based on finite mixtures can assist in the analysis of categorical and count outcomes when standard generalized linear models (GLMs) cannot adequately express variability observed in the data. We propose an extension of GLMs where the response follows a finite mixture distribution and the regression of interest is linked to the mixture’s mean. This approach may be preferred over a finite mixture of regressions when the population mean is of interest; here, only one regression must be specified and interpreted in the analysis. A technical challenge is that the mixture’s mean is a composite parameter that does not appear explicitly in the density. The proposed model maintains its link to the regression through a certain random effects structure and is completely likelihood-based. We consider typical GLM cases where means are either real-valued, constrained to be positive, or constrained to be on the unit interval. The resulting model is applied to two example datasets through Bayesian analysis. Supporting the extra variation is seen to improve residual plots and produce widened prediction intervals reflecting the uncertainty. Supplementary materials for this article are available online.