A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
A Diagnostic Approach to Assess the Quality of Data Splitting in Machine Learning
[article]
2022
arXiv
pre-print
In machine learning, a routine practice is to split the data into a training and a test data set. A proposed model is built based on the training data, and then the performance of the model is assessed using test data. Usually, the data is split randomly into a training and a test set on an ad hoc basis. This approach, pivoted on random splitting, works well but more often than not, it fails to gauge the generalizing capability of the model with respect to perturbations in the input of training
arXiv:2206.11721v1
fatcat:6ewzwjc4j5drnl6ftxpqdumvfu