Modified PCA and PLS: Towards a better classification in Raman spectroscopy–based biological applications
Journal of Chemometrics
Raman spectra of biological samples often exhibit variations originating from changes of spectrometers, measurement conditions, and cultivation conditions. Such unwanted variations make a classification extremely challenging, especially if they are more significant compared with the differences between groups to be separated. A classifier is prone to such unwanted variations (ie, intragroup variations) and can fail to learn the patterns that can help separate different groups (ie, intergroup
... ferences). This often leads to a poor generalization performance and a degraded transferability of the trained model. A natural solution is to separate the intragroup variations from the intergroup differences and build the classifier based on merely the latter information, for example, by a welldesigned feature extraction. This forms the idea of this contribution. Herein, we modified two commonly applied feature extraction approaches, principal component analysis (PCA) and partial least squares (PLS), in order to extract merely the features representing the intergroup differences. Both of the methods were verified with two Raman spectral datasets measured from bacterial cultures and colon tissues of mice, respectively. In comparison to ordinary PCA and PLS, the modified PCA was able to improve the prediction on the testing data that bears significant difference to the training data, while the modified PLS could help avoid overfitting and lead to a more stable classification.