New representations in genetic programming for feature construction in k-means clustering
release_eykxn33o3vbq3gfhqyzfzx26ku
by
Andrew Lensen,
Bing Xue,
Mengjie Zhang
2020
Abstract
© Springer International Publishing AG 2017. k-means is one of the fundamental and most well-known algorithms in data mining. It has been widely used in clustering tasks, but suffers from a number of limitations on large or complex datasets. Genetic Programming (GP) has been used to improve performance of data mining algorithms by performing feature construction—the process of combining multiple attributes (features) of a dataset together to produce more powerful constructed features. In this paper, we propose novel representations for using GP to perform feature construction to improve the clustering performance of the k-means algorithm. Our experiments show significant performance improvement compared to k-means across a variety of difficult datasets. Several GP programs are also analysed to provide insight into how feature construction is able to improve clustering performance.
In application/xml+jats
format
Archived Files and Locations
application/pdf
393.5 kB
file_ikpyj4dvojcg7a3ouqjpsunizy
|
s3-ap-southeast-2.amazonaws.com (publisher) web.archive.org (webarchive) |
post
Stage
unknown
Date 2020-10-06
access all versions, variants, and formats of this works (eg, pre-prints)
Crossref Metadata (via API)
Worldcat
wikidata.org
CORE.ac.uk
Semantic Scholar
Google Scholar