Shrinkage fisher information embedding of high dimensional feature distributions

Xu Chen, Yilun Chen, Alfred Hero
2011 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR)  
In this paper, we introduce a dimensionality reduction method that can be applied to clustering of high dimensional empirical distributions. The proposed approach is based on stabilized information geometrical representation of the feature distributions. The problem of dimensionality reduction on spaces of distribution functions arises in many applications including hyperspectral imaging, document clustering, and classifying flow cytometry data. Our method is a shrinkage regularized version of
more » ... larized version of Fisher information distance, that we call shrinkage FINE (sFINE), which is implemented by Steinian shrinkage estimation of the matrix of Kullback Liebler distances between feature distributions. The proposed method involves computing similarities using shrinkage regularized Fisher information distance between probability density functions (PDFs) of the data features, then applying Laplacian eigenmaps on a derived similarity matrix to accomplish the embedding and perform clustering. The shrinkage regularization controls the trade-off between bias and variance and is especially well-suited for clustering empirical probability distributions of high-dimensional data sets. We also show significant gains in clustering performance on both of the UCI dataset and a spam data set. Finally we demonstrate the superiority of embedding and clustering distributional data using sFINE as compared to other state-of-the-art methods such as non-parametric information clustering, support vector machine (SVM) and sparse K-means.
doi:10.1109/acssc.2011.6190349 dblp:conf/acssc/ChenCH11 fatcat:lwhm44alajbqxoxd6sworjkbtu