Generalized Principal Component Analysis: Projection of Saturated Model Parameters

Andrew J. Landgraf, Yoonkyung Lee
2019 Figshare  
Principal component analysis (PCA) is very useful for a wide variety of data analysis tasks, but its implicit connection to the Gaussian distribution can be undesirable for discrete data such as binary and multi-category responses or counts. We generalize PCA to handle various types of data using the generalized linear model framework. In contrast to the existing approach of matrix factorizations for exponential family data, our generalized PCA provides low-rank estimates of the natural
more » ... rs by projecting the saturated model parameters. This difference in formulation leads to the favorable properties that the number of parameters does not grow with the sample size and simple matrix multiplication suffices for computation of the principal component scores on new data. A practical algorithm which can incorporate missing data and case weights is developed for finding the projection matrix. Examples on simulated and real count data show the improvement of generalized PCA over standard PCA for matrix completion, visualization, and collaborative filtering. Supplementary material for this article is available online.
doi:10.6084/m9.figshare.9883061 fatcat:grvrunqbc5da5bmlyuxzbvujre