Fisher discriminant analysis with kernels

S. Mika, G. Ratsch, J. Weston, B. Scholkopf, K.R. Mullers
Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468)  
A non-linear classification technique based on Fisher9s discriminant is proposed. The main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space. The linear classification in feature space corresponds to a (powerful) non-linear decision function in input space. Large scale simulations demonstrate the competitiveness of our approach. DISCRIMINANT ANALYSIS In classification and other data analytic tasks it is often necessary to utilize
more » ... ocessing on the data before applying the algorithm at hand and it is common to first extract features suitable for the task to solve. Feature extraction for classification differs significantly from feature extraction for describing data. For example PCA finds directions which have minimal reconstruction error by describing as much variance of the data as possible with m orthogonal directions. Considering the first directions they need not (and in practice often will not) reveal the class structure that we need for proper classification. Discriminant analysis addresses the following question: Given a data set with two classes, say, which is the best feature or feature set (either linear or non-linear) to discriminate the two classes? Classical approaches tackle this question by starting with the (theoretically) optimal Bayes classifier and, by assuming normal distributions for the classes, standard algorithms like quadratic or linear discriminant analysis, among them the famous Fisher discriminant, can be derived (e.g. [5]). Of course any other model different from a Gaussian for the class distributions could be as-
doi:10.1109/nnsp.1999.788121 fatcat:qnhtlp53vrepzb62lapqafndge