New representations in genetic programming for feature construction in k-means clustering release_eykxn33o3vbq3gfhqyzfzx26ku

by Andrew Lensen, Bing Xue, Mengjie Zhang

Released as a post by Victoria University of Wellington Library.

2020  

Abstract

© Springer International Publishing AG 2017. k-means is one of the fundamental and most well-known algorithms in data mining. It has been widely used in clustering tasks, but suffers from a number of limitations on large or complex datasets. Genetic Programming (GP) has been used to improve performance of data mining algorithms by performing feature construction—the process of combining multiple attributes (features) of a dataset together to produce more powerful constructed features. In this paper, we propose novel representations for using GP to perform feature construction to improve the clustering performance of the k-means algorithm. Our experiments show significant performance improvement compared to k-means across a variety of difficult datasets. Several GP programs are also analysed to provide insight into how feature construction is able to improve clustering performance.
In application/xml+jats format

Archived Files and Locations

application/pdf   393.5 kB
file_ikpyj4dvojcg7a3ouqjpsunizy
s3-ap-southeast-2.amazonaws.com (publisher)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  post
Stage   unknown
Date   2020-10-06
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: f5148716-8082-4977-97f1-b2b74de440e8
API URL: JSON