How have high-impact scientific studies designing their experiments on mixed data clustering? A systematic map to guide better choices

Nádia Junqueira Martarelli, Marcelo Seido Nagano
2021 Machine Learning with Applications  
Please cite this article as: N.J. Martarelli and M.S. Nagano, How have high-impact scientific studies designing their experiments on mixed data clustering? A systematic map to guide better choices. J o u r n a l P r e -p r o o f Journal Pre-proof REVIEWED ARTICLE Abstract Many scientific works on grouping mixed data have chosen different experimental scenarios to structure their findings, which involve deciding from the programming language to the use of real-world and/or simulated datasets and
more » ... ulated datasets and performance measures, for example. Due to these characteristics directly influence the conclusion of the studies and the way new scientific works are proposed, it would be useful to have a wide map with the main choices that have been done by the authors of high-impact scientific documents so that the community can reflect on the thematic direction, identify best practices, and propose new paths for future research. To the best of our knowledge, such a map does not exist, neither a methodological procedure to build it. Therefore, this paper proposes a systematic methodology to reach such maps and provides a wide and in-depth map of the main choices the authors of high-impact scientific documents on mixed data clustering and surrounding studies have done in their experiments. As a result, 160 documents were systematically selected and classified into one of the six class of data clustering approaches, besides individually tabulated. From the tables for each class, we found, for instance, that real-world datasets are used more frequently than simulated ones, the documents used more external indices, followed by internal and relative ones, it is not common for the authors to inform the programming language they have used, except in the partitional class. We also provided the address of the algorithms' code when they are made available by the authors.
doi:10.1016/j.mlwa.2021.100056 fatcat:x5hu5syftzbyzbdvwsewwl43lm