Non-Parametric Kernel Learning with robust pairwise constraints
International Journal of Machine Learning and Cybernetics
For existing kernel learning based semi-supervised clustering algorithms, it is generally difficult to scale well with large scale datasets and robust pairwise constraints. In this paper, we proposed a new Non-Parametric Kernel Learning framework (NPKL) to deal with these problems. We generalized the graph embedding framework into kernel learning, by reforming it as a semi-definitive programming (SDP) problem, smoothing and avoiding over-smoothing the functional Hilbert space with Laplacian
... larization. We proposed two algorithms to solve this problem. One is a straightforward algorithm using semidefinite programming (SDP) to solve the original kernel learning problem, dented as TRAnsductive Graph Embedding Kernel learning (TRAGEK); the other is to relax the SDP problem and solve it with a constrained gradient descent algorithm. To accelerate the learning speed, we further divide the data into groups and used the sub-kernels of these groups to approximate the whole kernel matrix. This algorithm is denoted as Efficient Non-PArametric Kernel Learning (ENPAKL). The advantages of the proposed NPKL framework are 1) supervised information in the form of pairwise constraints can be easily incorporated; 2) it is robust to the number of pairwise constraints, i.e., the number of constraints does not affect the running time too much; 3) ENPAKL is efficient 2 Changyou Chen et al. to some extent compared to some related kernel learning algorithms since it is a constraint gradient descent based algorithm. Experiments for clustering based on the learned kernels show that the proposed framework scales well with the size of datasets and the number of pairwise constraints. Further experiments for image segmentation indicate the potential advantages of the proposed algorithms over the traditional k-means and N -cut clustering algorithms for image segmentation in term of segmentation accuracy. Keywords Kernel learning · semi-definitive programming · graph embedding · pairwise constraint · semi-supervised learning 1 Introduction Semi-supervised clustering based on kernel learning is a popular research topic in machine learning since one can incorporate the information of a limited number of labeled data or a set of pairwise constraints into the kernel learning framework  . The reason is that for clustering, the pairwise constraints provide useful information about which data pairs are in the same category and which ones are not. To learn such kinds of kernel matrices, Kulis et al.  proposed to construct a graph based kernel matrix which unifies the vector-based and graph-based semi-supervised clustering. A further refinement on learning kernel matrices for clustering was investigated by Li et al.  . In their approach, data are implicitly projected onto a feature space which is a unit hyperball, subjected to a collection of pairwise constraints. However, the above clustering algorithms via kernel matrices either can not scale well with the increasing number of pairwise constraints and the amount of data, or lacks theoretical guarantee for the positive semi-definite property of the kernel matrices. In another aspect, Yueng et al.  proposed an efficient kernel learning algorithm through low rank matrix approximation. However, in their algorithm, the form of kernel matrix is assumed to be linear combination of several base kernel matrices. Note that this might reduce the dimension of the hypothesis kernel space, we call such kinds of algorithms parametric kernel learning. In addition, Cortes et al.  proposed a kernel learning algorithm by taking the non-linear combinations of kernels, which is a generalization of the linear combination case but still lies in the framework of parametric kernel learning. Addressing these two limitations is the major purpose of this paper. On the other hand, we note that many algorithms based on the graph embedding framework often achieve an enhanced discriminant ability by utilizing the marginal information, e.g., making the dissimilarity data points near the margin as far as possible and meanwhile compacting the points in the same class [30, 31] . It is therefore worthwhile to generalize the graph embedding framework into kernel learning 1 . Based on the aforementioned goals, in this paper we propose a new scalable kernel learning framework NPKL (Non-Parametric Kernel Learning with robust pairwise constraints), and apply it for semi-supervised clustering. First, we generalize the graph embedding framework on a feature space which is assumed to be a possibly infinite subspace of the l 2 Hilbert space with unit norm, which is similar to  . Then the unknown feature projection function φ is implicitly learned by transforming the criterion of the graph embedding (i.e., maximizing the sum of distances 1 Note that although we want to learn a kernel matrix from the aspect of graph embedding, it has little relationship with some algorithms using graph embedding framework such as marginal factor analysis (MFA)  . The reason is that such kinds of algorithms aim at supervised learning for classification, thus there is no need to compare the proposed algorithm with them.