Taylor & Francis Group
Browse
utch_a_1668854_sm5631.pdf (671.88 kB)

Generalized Principal Component Analysis: Projection of Saturated Model Parameters

Download (671.88 kB)
journal contribution
posted on 2019-09-19, 19:38 authored by Andrew J. Landgraf, Yoonkyung Lee

Principal component analysis (PCA) is very useful for a wide variety of data analysis tasks, but its implicit connection to the Gaussian distribution can be undesirable for discrete data such as binary and multi-category responses or counts. We generalize PCA to handle various types of data using the generalized linear model framework. In contrast to the existing approach of matrix factorizations for exponential family data, our generalized PCA provides low-rank estimates of the natural parameters by projecting the saturated model parameters. This difference in formulation leads to the favorable properties that the number of parameters does not grow with the sample size and simple matrix multiplication suffices for computation of the principal component scores on new data. A practical algorithm which can incorporate missing data and case weights is developed for finding the projection matrix. Examples on simulated and real count data show the improvement of generalized PCA over standard PCA for matrix completion, visualization, and collaborative filtering. Supplementary material for this article is available online.

Funding

This research was supported in part by National Science Foundation (grants no. DMS-12-09194 and DMS-15-13566).

History